01.01.2015 Views

Spotlight on Spotlight - Carol Smith Home Page

Spotlight on Spotlight - Carol Smith Home Page

Spotlight on Spotlight - Carol Smith Home Page

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Smith</strong> 18<br />

SYSTEM REVIEW<br />

Apple’s <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> applicati<strong>on</strong> is tightly integrated with the Mac operating system. As such, its<br />

indexing and retrieval operati<strong>on</strong>s are closely guarded proprietary processes, discussed <strong>on</strong>ly in<br />

the broadest terms in corporate literature. Despite these restricti<strong>on</strong>s, the above performance<br />

evaluati<strong>on</strong> provides a good understanding of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s underlying informati<strong>on</strong> retrieval<br />

model. What follows is a broad review of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s informati<strong>on</strong> retrieval features, as<br />

deduced from the evaluati<strong>on</strong> process. Search and retrieval issues that became apparent<br />

during the course of the evaluati<strong>on</strong> are also noted.<br />

IR Model<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s underlying informati<strong>on</strong> retrieval model appears to be a classic Boolean model<br />

with binary weighting; that is, a document is either relevant (included in results set) or n<strong>on</strong>relevant<br />

(excluded from results set). This is evidenced by the fact that results are presented in<br />

lexicographic order by file name, with no ranking algorithm used.<br />

Text operati<strong>on</strong>s<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s logical view of documents is full-text, and includes informati<strong>on</strong> relating to<br />

syntactic structure. Experimentati<strong>on</strong> does not suggest the existence of any text normalizati<strong>on</strong><br />

procedures. Specifically, there appears to be:<br />

<br />

<br />

<br />

No lexical analysis. For example, a <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> search <strong>on</strong> “full-text” will locate this<br />

paper; but it is excluded from the results list if the hyphen is excluded in the search<br />

query.<br />

No stopword removal. A search of the word ‘the’, for example yields 12,040 matches<br />

<strong>on</strong> the tested hard drive. This figure includes 1,1155 rich and plain text documents,<br />

and 443 PDF documents.<br />

No stemming. A search <strong>on</strong> the word ‘enter’ will return this paper in the results set,<br />

but fails to do so if the same word is searched with an added suffix of ‘-s’ or ‘-ing’.<br />

Text languages<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexes a broad range of text languages, including both plain and rich text formats,<br />

PDF documents, markup languages, and metadata for many comm<strong>on</strong> file formats.<br />

Additi<strong>on</strong>ally, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> can index multimedia files, system f<strong>on</strong>ts and scripts, and applicati<strong>on</strong>specific<br />

text such as e-mail messages, address book entries, etc. Additi<strong>on</strong>ally, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> plugins<br />

permit developers to expand <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s indexing coverage to handle less comm<strong>on</strong> or<br />

newly developed text languages and file formats.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!