Spotlight on Spotlight - Carol Smith Home Page
Spotlight on Spotlight - Carol Smith Home Page
Spotlight on Spotlight - Carol Smith Home Page
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Smith</strong> 18<br />
SYSTEM REVIEW<br />
Apple’s <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> applicati<strong>on</strong> is tightly integrated with the Mac operating system. As such, its<br />
indexing and retrieval operati<strong>on</strong>s are closely guarded proprietary processes, discussed <strong>on</strong>ly in<br />
the broadest terms in corporate literature. Despite these restricti<strong>on</strong>s, the above performance<br />
evaluati<strong>on</strong> provides a good understanding of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s underlying informati<strong>on</strong> retrieval<br />
model. What follows is a broad review of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s informati<strong>on</strong> retrieval features, as<br />
deduced from the evaluati<strong>on</strong> process. Search and retrieval issues that became apparent<br />
during the course of the evaluati<strong>on</strong> are also noted.<br />
IR Model<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s underlying informati<strong>on</strong> retrieval model appears to be a classic Boolean model<br />
with binary weighting; that is, a document is either relevant (included in results set) or n<strong>on</strong>relevant<br />
(excluded from results set). This is evidenced by the fact that results are presented in<br />
lexicographic order by file name, with no ranking algorithm used.<br />
Text operati<strong>on</strong>s<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s logical view of documents is full-text, and includes informati<strong>on</strong> relating to<br />
syntactic structure. Experimentati<strong>on</strong> does not suggest the existence of any text normalizati<strong>on</strong><br />
procedures. Specifically, there appears to be:<br />
<br />
<br />
<br />
No lexical analysis. For example, a <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> search <strong>on</strong> “full-text” will locate this<br />
paper; but it is excluded from the results list if the hyphen is excluded in the search<br />
query.<br />
No stopword removal. A search of the word ‘the’, for example yields 12,040 matches<br />
<strong>on</strong> the tested hard drive. This figure includes 1,1155 rich and plain text documents,<br />
and 443 PDF documents.<br />
No stemming. A search <strong>on</strong> the word ‘enter’ will return this paper in the results set,<br />
but fails to do so if the same word is searched with an added suffix of ‘-s’ or ‘-ing’.<br />
Text languages<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexes a broad range of text languages, including both plain and rich text formats,<br />
PDF documents, markup languages, and metadata for many comm<strong>on</strong> file formats.<br />
Additi<strong>on</strong>ally, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> can index multimedia files, system f<strong>on</strong>ts and scripts, and applicati<strong>on</strong>specific<br />
text such as e-mail messages, address book entries, etc. Additi<strong>on</strong>ally, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> plugins<br />
permit developers to expand <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s indexing coverage to handle less comm<strong>on</strong> or<br />
newly developed text languages and file formats.