Spotlight on Spotlight - Carol Smith Home Page
Spotlight on Spotlight - Carol Smith Home Page
Spotlight on Spotlight - Carol Smith Home Page
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />
<strong>on</strong> <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />
An evaluati<strong>on</strong> and review of Mac OS X<br />
tiger’s desktop indexing applicati<strong>on</strong><br />
<strong>Carol</strong> <strong>Smith</strong><br />
Info 624 – informati<strong>on</strong> retrieval systems<br />
Summer 2005, Buzydlowski<br />
Submitted august 17, 2005
<strong>Smith</strong> 2<br />
TABLE OF CONTENTS<br />
ABSTRACT 3<br />
AUTHOR KEYWORDS 3<br />
INTRODUCTION 3<br />
Problem Domain and Scope 3<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> Features – Brief Overview 4<br />
DATA SET 6<br />
Data Set Proposal 6<br />
Data Set Descripti<strong>on</strong> 6<br />
Sample Document 7<br />
Data Set Creati<strong>on</strong> 7<br />
Data Set Issues 8<br />
EVALUATION 9<br />
Methodology 9<br />
1. Functi<strong>on</strong>al Analysis 10<br />
2. System Performance Evaluati<strong>on</strong> 11<br />
3. Retrieval Performance 14<br />
SYSTEM REVIEW 18<br />
IR Model 18<br />
Text operati<strong>on</strong>s 18<br />
Text languages 18<br />
Query language and operati<strong>on</strong>s 19<br />
User interface, retrieval issues 19<br />
CONCLUSION 20<br />
BIBLIOGRAPHY 21
<strong>Smith</strong> 3<br />
ABSTRACT<br />
As the number of text and multimedia files stored by the average computer user c<strong>on</strong>tinues to<br />
increase, so will the need to effectively index and access <strong>on</strong>e's 'pers<strong>on</strong>al digital library'. Apple<br />
Computer, Inc.’s latest operating system, Mac OS X 10.4 (dubbed ‘Tiger’), includes an<br />
integrated indexing and retrieval applicati<strong>on</strong> known as ‘<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’. This paper analyzes<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s capabilities and limitati<strong>on</strong>s by first defining a data set of textual documents, and<br />
then testing <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s performance in indexing and accessing the data set. The basic<br />
features of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> are introduced, and then the test data set is described, including issues<br />
related to its creati<strong>on</strong> and use. A three-pr<strong>on</strong>ged approach was adopted to assess the<br />
performance of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>. During the functi<strong>on</strong>al analysis phase, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> was tested for<br />
system errors. A performance analysis then assessed the speed and storage requirements of<br />
the indexing system. Finally, a retrieval performance evaluati<strong>on</strong> tested <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> against the<br />
data set for precisi<strong>on</strong>, recall and harm<strong>on</strong>ic mean measurements. The paper c<strong>on</strong>cludes with<br />
observati<strong>on</strong>s about <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s underlying informati<strong>on</strong> retrieval model, in terms of text<br />
languages and operati<strong>on</strong>s, query languages and operati<strong>on</strong>s, and interface issues. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> is<br />
found to be a utility with great promise, but also with significant challenges.<br />
AUTHOR KEYWORDS<br />
Apple; Mac OS X 10.4; Tiger; <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>; informati<strong>on</strong> retrieval; operating systems; evaluati<strong>on</strong>.<br />
INTRODUCTION<br />
Problem Domain and Scope<br />
As the number of text and multimedia files stored by the average computer user c<strong>on</strong>tinues to<br />
increase, so will the need to effectively index and access <strong>on</strong>e's 'pers<strong>on</strong>al digital library'. Both<br />
Microsoft and Apple Computer have recognized this growing need to manage digital<br />
collecti<strong>on</strong>s, and have been racing to integrate informati<strong>on</strong> retrieval utilities into their<br />
competing operating systems. Microsoft’s Window Vista operating system (formerly known<br />
as ‘L<strong>on</strong>ghorn’) is slated for release sometime in 2006, and is expected to include integrated<br />
indexing/query capabilities. Apple Computer, Inc. (hereafter, ‘Apple’), however, ‘beat them<br />
to the punch’, releasing Mac OS X 10.4 (dubbed ‘Tiger’) to the public <strong>on</strong> April 29, 2005.<br />
Included in Mac OS X Tiger is an integrated indexing and retrieval applicati<strong>on</strong> known as<br />
‘<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’.<br />
Mac users of greatly varying technical ability are already actively using <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> to seek and<br />
retrieve informati<strong>on</strong> from their desktop computing envir<strong>on</strong>ments; indeed al<strong>on</strong>g with their<br />
favorite Internet browser and search engine, it is likely to become the informati<strong>on</strong> retrieval<br />
system they access most often. As a developer's tool, the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> search engine will also be<br />
incorporated into dozens of third-party applicati<strong>on</strong>s. Given its potential for wide use, a<br />
thorough analysis of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s informati<strong>on</strong> retrieval capabilities and limitati<strong>on</strong>s is<br />
warranted. Published reviews of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> are glowing with praise, but unfortunately provide<br />
little analytical detail. This paper seeks to fill that void by systematically evaluating <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s
<strong>Smith</strong> 4<br />
performance. To accomplish this, a data set of textual documents is first created and defined,<br />
and then used to test <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s performance in indexing and accessing the data set.<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> Features – Brief Overview<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexes the c<strong>on</strong>tents of a drive automatically, with no explicit acti<strong>on</strong> required by<br />
the user. For textual documents, the full text is indexed, and for all file types, applicati<strong>on</strong><br />
metadata is also indexed. The <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> engine works closely with the Mac operating system,<br />
updating the index anytime a new file is created or modified. Additi<strong>on</strong>al detail about<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s indexing architecture and processes is provided in the Indexing Processes and<br />
System Review secti<strong>on</strong>s of this paper.<br />
Because the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexing/retrieval engine is an integrated comp<strong>on</strong>ent of the Mac<br />
operating system, it is ‘always <strong>on</strong>’ and doesn’t need to be launched in the manner of a<br />
traditi<strong>on</strong>al applicati<strong>on</strong>. The <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> query window is always available within a few<br />
keystrokes, and can be accessed in a number of alternate ways:<br />
1. The upper right-hand window of the Mac interface c<strong>on</strong>tains a permanent ‘spyglass’<br />
ic<strong>on</strong> that is always within view– clicking <strong>on</strong> this ic<strong>on</strong> brings up the basic <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />
query window:<br />
2. The same query window can alternately be reached via a command-space bar<br />
keystroke combinati<strong>on</strong>.<br />
3. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> query windows are built into popular Apple applicati<strong>on</strong>s and utilities,<br />
including Mail, Preferences, Address Book, Calendar and others.<br />
4. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> queries can also be executed within the Apple Finder window. This final<br />
access method also permits the creati<strong>on</strong> and saving of customized queries called<br />
‘Smart Folders.’ Apple envisi<strong>on</strong>s these virtual folders as a new method for organizing<br />
and managing informati<strong>on</strong>, <strong>on</strong>e that may at least partially supplant traditi<strong>on</strong>al<br />
physical file organizati<strong>on</strong>.
<strong>Smith</strong> 5<br />
As keywords are entered into the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> query window, matching documents <strong>on</strong> the drive<br />
are listed, ordered first by general file type, then lexicographically by file name. Clicking <strong>on</strong><br />
any file name will open the document within its associated applicati<strong>on</strong>:<br />
A user can also hit the Return key, in order to open the results set in a separate window. This<br />
window permits additi<strong>on</strong>al manipulati<strong>on</strong> of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> query results, including alternate<br />
ordering opti<strong>on</strong>s and additi<strong>on</strong>al filtering functi<strong>on</strong>s:
<strong>Smith</strong> 6<br />
DATA SET<br />
Data Set Proposal<br />
In keeping with <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s anticipated use as an indexing and retrieval system for individual<br />
digital collecti<strong>on</strong>s, a genealogical data set of pers<strong>on</strong>al interest and utility to the author was<br />
envisi<strong>on</strong>ed.<br />
Genealogists spend a significant amount of time reviewing Internet message boards,<br />
particularly those dedicated to family surname research. These message board services offer<br />
sophisticated query capabilities, including features such as field searching, Soundex<br />
searching, and impositi<strong>on</strong> of date range limits. Despite such useful informati<strong>on</strong> retrieval<br />
features, however, searching the message boards is still a time-c<strong>on</strong>suming affair. Because of<br />
historical name spelling variants, a message of interest might be posted <strong>on</strong> any of multiple<br />
surname message boards. When researching the Minnick family, for example, a query must<br />
be individually executed <strong>on</strong> as many as 12 different message boards (Minnick; Minick;<br />
Minck; etc…) located <strong>on</strong> multiple servers, in order to c<strong>on</strong>duct a comprehensive search.<br />
There is currently no way to query multiple Internet message boards simultaneously, even<br />
within a single web site.<br />
By creating a unified data set of postings from multiple Internet message boards, a<br />
genealogist should be able to utilize <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> to rapidly execute comprehensive searches via<br />
a single query. Creati<strong>on</strong> of the initial data set will requires a significant investment of time up<br />
fr<strong>on</strong>t, but should be rewarded by faster search and retrieval for subsequent informati<strong>on</strong><br />
needs. The created data set, in combinati<strong>on</strong> with <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>, will essentially enable ‘metasearch’<br />
capabilities across multiple message boards.<br />
Data Set Descripti<strong>on</strong><br />
The test data set was drawn from six separate surname message boards, all hosted by<br />
Ancestry.com (http://ancestry.com/share/):<br />
Minnick Surname Board<br />
Minick Surname Board<br />
Mink Surname Board<br />
Minnich Surname Board<br />
Minich Surname Board<br />
Minck Surname Board<br />
Additi<strong>on</strong>al message boards are located at http://genforum.genealogy.com and numerous<br />
other genealogical web sites; these would be included in any fully implemented project, but<br />
were deemed n<strong>on</strong>essential for the paper’s primary purpose of evaluating <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s<br />
indexing/retrieval performance.<br />
To keep the project manageable, the data set was limited to discussi<strong>on</strong> threads with an initial<br />
posting dated 1/1/2004 or later. Any messages dated 1/1/2004 or later but related to a
<strong>Smith</strong> 7<br />
thread initiated prior to 2004 were not c<strong>on</strong>sidered for inclusi<strong>on</strong>. These date limits resulted in<br />
a data set of 144 plain text files, as well as 22 JPEG images that were attached to the original<br />
messages.<br />
The data set displays several interesting characteristics that may present informati<strong>on</strong> retrieval<br />
challenges, including:<br />
A high incidence of recurring terms (first names, dates, etc.)<br />
Numerous variant expressi<strong>on</strong>s, including abbreviati<strong>on</strong>s and unintenti<strong>on</strong>al misspellings<br />
(e.g., Mississippi; Miss.; Missisippi; MS)<br />
Polysemy; that is, words with multiple possible meanings (e.g., Virginia as a place;<br />
Virginia as a female name)<br />
Sample Document<br />
The below image is representative of a typical message board posting, in its original HTML<br />
formatting.<br />
Boards > Surnames > Minck<br />
URL: http://boards.ancestry.com/mbexec/message/an/surnames.minck/9<br />
Data Set Creati<strong>on</strong><br />
Creati<strong>on</strong> of the data set was a predictably tedious, manual affair. For each of the 144<br />
individual message board postings, the following steps were followed, in sequence:
<strong>Smith</strong> 8<br />
1. The target html page was opened within a web browser.<br />
2. Because a straight copy/paste routine would have captured undesirable informati<strong>on</strong><br />
and hyperlinks related extraneous to the message, each page was then reloaded via<br />
the page’s ‘Printer-friendly’ hyperlink.<br />
3. The message was copied in its entirety using cmd-a/cmd-c keyboard shortcuts<br />
(Macintosh).<br />
4. Using the cmd-v keyboard shortcut (Macintosh), the message was then pasted into a<br />
new plain text document, using Apple’s TextEdit applicati<strong>on</strong>.<br />
5. Two correcti<strong>on</strong>s were made to each plain text document:<br />
a. The phrase “Return to Message” (a hyperlink in the original page) was<br />
deleted from the end of each document.<br />
b. In order to avoid web crawler agents, the original html pages provide e-mail<br />
addresses in .gif format. For this reas<strong>on</strong>, each author’s e-mail address<br />
informati<strong>on</strong> needed to be entered manually.<br />
6. Each plain text file was then saved to the hard drive.<br />
7. A small percentage of message board postings were accompanied by .jpg<br />
attachments, typically scanned documents relating to the message. Each of these<br />
attachments (22 in all) was saved as separate data set files. Each attachment had to<br />
first be loaded into a separate browser window, for some unknown reas<strong>on</strong>,<br />
attachments could <strong>on</strong>ly be saved as .gif images without this extra step, even though<br />
the extensi<strong>on</strong> of the attachment indicated it was a .jpg file.<br />
After some c<strong>on</strong>siderati<strong>on</strong>, it was decided to name each text file sequentially, beginning with<br />
001, 002, 003, etc. If an initial message board posting received replies, each posting of a<br />
single thread were given the same number, but distinguished with sequential letters; e.g.,<br />
001a, 001b, 001c, etc… Some thought was given as to whether file names should indicate<br />
the level of depth in a particular thread; that is, if a posting was the sec<strong>on</strong>d reply to a reply of<br />
an initial posting, label it 001aab. This level of complexity was deemed unnecessary,<br />
however, as any thread in questi<strong>on</strong> could be easily located in its original web locati<strong>on</strong>, should<br />
the sequence of postings become of interest.<br />
Data Set Issues<br />
As described in the Functi<strong>on</strong>al Analysis secti<strong>on</strong> below, two decisi<strong>on</strong>s made during the<br />
creati<strong>on</strong> of the initial data set proved problematic, and required further data set modificati<strong>on</strong>:<br />
1. Because Mac files do not require extensi<strong>on</strong>s (.txt, .doc, etc.), extensi<strong>on</strong>s were not<br />
initially entered during the file-naming step.<br />
2. Documents were initially saved to separate sub-folders for each of the Internet<br />
message boards (i.e., “Ancestry-Minnick”; “Ancestry-Minick”; “Ancestry-Minck”;<br />
“Ancestry-Minnich”; “Ancestry-Minich”; “Ancestry-Mink”). Attachments were<br />
further segregated into folders within these folders, labeled “Ancestry-Minnick-<br />
Images”, etc. Finally, all six subfolders were c<strong>on</strong>tained within a single top-level folder<br />
labeled “<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> Data Set.”<br />
It should also be noted that the fielded format of the documents in their original web format
<strong>Smith</strong> 9<br />
was an initial attracti<strong>on</strong>, as it offered up the possibility of testing <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s performance <strong>on</strong><br />
structural (syntactic) queries, as well as <strong>on</strong> semantic c<strong>on</strong>tent. The fielded structure of data is<br />
not retained, however, when informati<strong>on</strong> is c<strong>on</strong>verted to plain text format. This loss of<br />
structural integrity was not anticipated (author oversight), but is hopefully made up for by an<br />
extended <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> system review.<br />
EVALUATION<br />
Methodology<br />
Evaluati<strong>on</strong>s were c<strong>on</strong>ducted by two independent users (hereafter, “User 1” and “User 2”).<br />
One user was familiar with the c<strong>on</strong>tents of the data set, the other not. Neither user had prior<br />
experience with the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> interface, although they were both experienced users of<br />
Macintosh operating systems. All tasks were c<strong>on</strong>ducted in batch mode (vs. interactive) mode;<br />
that is, queries were executed and resp<strong>on</strong>ses evaluated <strong>on</strong> an individual basis, rather than as<br />
iterative sequences of query/resp<strong>on</strong>se/revised query. Evaluati<strong>on</strong>s were carried out in a home<br />
setting, but the structured nature of the tasks rendered the experiments closer to a laboratory<br />
sessi<strong>on</strong> than a real-life field assessment. All evaluati<strong>on</strong>s were c<strong>on</strong>ducted <strong>on</strong> a 1.07GHz Apple<br />
iBook G4 laptop computer with 512 MB of DDR SDRAM.<br />
A three-pr<strong>on</strong>ged approach was adopted to assess the performance of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>:<br />
1. Functi<strong>on</strong>al Analysis: During this errors analysis phase, users were asked to freely<br />
utilize <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> to access the data set. Defined retrieval tasks were not provided;<br />
instead, users created a range of their own ad hoc tasks intended to reveal functi<strong>on</strong>al<br />
problems or inc<strong>on</strong>sistencies in the system’s indexing and retrieval performance.<br />
Users were asked to determine whether their chosen tasks were properly supported<br />
by the system, and whether they were ultimately able to accomplish the tasks. Users<br />
were asked to verbalize their experiences and challenges as these tasks were executed.<br />
2. System Performance Evaluati<strong>on</strong>: As a proprietary system (and due to the author’s<br />
lack of technical prowess), obtaining precise calculati<strong>on</strong>s of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s resp<strong>on</strong>se time<br />
and storage requirements proved challenging. Resp<strong>on</strong>se time tests were executed<br />
using a manually c<strong>on</strong>trolled stopwatch; calculati<strong>on</strong>s should therefore be <strong>on</strong>ly<br />
c<strong>on</strong>sidered as estimates. Further, an academic discussi<strong>on</strong> of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s storage<br />
requirements was substituted for an actual evaluati<strong>on</strong>, as no means for calculating<br />
storage use could be determined by the author.<br />
3. Retrieval Performance Evaluati<strong>on</strong>: To assess the indexing and retrieval<br />
performance of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>, classic precisi<strong>on</strong> and recall measurements were calculated<br />
separately. Harm<strong>on</strong>ic mean, a measurement unifying precisi<strong>on</strong> and recall levels into a<br />
single metric, is also provided.<br />
Executi<strong>on</strong> and outcome of these three evaluati<strong>on</strong> modes are discussed in separate secti<strong>on</strong>s,<br />
below.
<strong>Smith</strong> 10<br />
1. Functi<strong>on</strong>al Analysis<br />
Users were asked to first browse the data set and examine individual documents, in order to<br />
devise a broad range of retrieval tasks. These tasks were then executed freely using <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>,<br />
in an effort to reveal functi<strong>on</strong>al challenges. Performance errors became apparent at a very<br />
early stage:<br />
Task #1 (User 1): Locate all documents authored by Verna Williams.<br />
Error: During the browsing sessi<strong>on</strong>, User 1 pre-determined that at least 5<br />
individual documents existed in the data set with author listed as either<br />
‘Verna’ or ‘Verna Williams’. When entering ‘Verna’, ‘Author: Verna’ or<br />
‘Author: Verna Williams’ in the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>, however, zero results were<br />
retrieved. Similar retrieval failures were c<strong>on</strong>sistently experienced with other<br />
devised user tasks.<br />
Problem Source: Several hours of explorati<strong>on</strong> and experimentati<strong>on</strong> revealed<br />
the reas<strong>on</strong> for this performance failure. All 144 plain text files in the data set<br />
were saved without the .txt extensi<strong>on</strong> added to the document name. Such file<br />
extensi<strong>on</strong>s are necessary in a Windows envir<strong>on</strong>ment, but are opti<strong>on</strong>al in Mac<br />
operating systems. In order for <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> to index a plain text file, however,<br />
the .txt file extensi<strong>on</strong> is apparently required. Although not necessarily a<br />
functi<strong>on</strong>al error, it is <strong>on</strong>e that c<strong>on</strong>flicts with l<strong>on</strong>g-established system<br />
behavior, and will inevitably cause c<strong>on</strong>fusi<strong>on</strong> for Mac users.<br />
Resoluti<strong>on</strong>: As so<strong>on</strong> as the extensi<strong>on</strong> was added to all text files in the data<br />
set, User 1 was able to successfully execute all remaining functi<strong>on</strong>al retrieval<br />
tasks.<br />
Task #2 (User 2): Locate all documents menti<strong>on</strong>ing the name “John”.<br />
Error: During the browsing sessi<strong>on</strong>, User 2 observed that John was a<br />
comm<strong>on</strong> name listed in the message board postings, and was curious about<br />
what percentage of documents c<strong>on</strong>tained this name. Even after the .txt<br />
extensi<strong>on</strong> had been added to all plain text files in the document set, however,<br />
User 2 was unable to execute the retrieval task, receiving a results set of zero<br />
documents.<br />
Problem Source: Again, explorati<strong>on</strong> and experimentati<strong>on</strong> led to an<br />
understanding of the performance failure. Whereas User 1 was utilizing<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> to c<strong>on</strong>duct a system-wide search of the computer’s entire hard<br />
drive, and then analyzing just those .txt files with the anticipated file name<br />
formatting (001, 002, etc.), User 2 chose to access <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> via the system’s<br />
Finder feature, specifying a focused search of just the Data Set folder. As<br />
previously described, this folder c<strong>on</strong>tains six subfolders, and documents were<br />
saved within those six subfolders. It was determined that when <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> is<br />
directed to examine the c<strong>on</strong>tents of a particular folder, it analyzes folder
<strong>Smith</strong> 11<br />
c<strong>on</strong>tents <strong>on</strong>ly <strong>on</strong>e level deep; that is, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> was seeking keyword matches<br />
<strong>on</strong> the folders themselves, but not <strong>on</strong> their c<strong>on</strong>tents.<br />
Resoluti<strong>on</strong>: Although this may not be a functi<strong>on</strong>al error, it is regarded as a<br />
serious limitati<strong>on</strong> in <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s performance, and <strong>on</strong>e that is likely to c<strong>on</strong>fuse<br />
many users, whose prior computing experience will lead them to expect all<br />
nested c<strong>on</strong>tents of a folder to be c<strong>on</strong>sidered during a targeted search query.<br />
To overcome the immediate problem, however, the situati<strong>on</strong> was resolved by<br />
removing all nested folder structures within the Data Set Folder. Once<br />
accomplished, User 2 was able to c<strong>on</strong>duct all remaining functi<strong>on</strong>al retrieval<br />
tasks (see Appendix A) without error.<br />
After this, no further functi<strong>on</strong>al errors were identified, and the functi<strong>on</strong>al evaluati<strong>on</strong> was<br />
c<strong>on</strong>cluded.<br />
2. System Performance Evaluati<strong>on</strong><br />
Performance evaluati<strong>on</strong>s assess the efficiency of a retrieval system’s architecture, in terms of<br />
its use of storage space, system interacti<strong>on</strong>s and resp<strong>on</strong>se time. Unfortunately, my technical<br />
abilities are limited, and Apple provides no built-in utilities for assessing <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />
performance; after some research I was unable to identify any feasible methods for<br />
generating precisi<strong>on</strong> performance metrics. In lieu of this, I elected instead to make rough<br />
resp<strong>on</strong>se time observati<strong>on</strong>s (using a manually c<strong>on</strong>trolled stopwatch), and to discuss<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s indexing and retrieval structures in general terms.<br />
2a. Resp<strong>on</strong>se time: Because <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> is tightly integrated with the Mac operating<br />
system, indexing of an entire file system is c<strong>on</strong>ducted as so<strong>on</strong> as a drive has been<br />
introduced, and automatically updated each time a new document is created or<br />
modified. This up-to-date system level index is readily available at any time for<br />
searching by the user, via an ic<strong>on</strong> in the upper-right hand corner of the interface. As<br />
<strong>on</strong>e begins typing a word, matching documents immediately begin filling the results<br />
window, arranged by media type. As the word c<strong>on</strong>tinues to be typed, n<strong>on</strong>-matching<br />
results are rapidly eliminated from the results window. Once the user has finished<br />
typing the search query, the final results screen takes anywhere from 3-5 sec<strong>on</strong>ds to<br />
stabilize. From a user perspective, then, initial resp<strong>on</strong>se times appear nearly<br />
immediate, with final results usually available within a 5-sec<strong>on</strong>d time frame.<br />
With ad hoc experimentati<strong>on</strong>, two general resp<strong>on</strong>se time challenges were observed<br />
(neither specifically associated with the defined data set):<br />
<br />
If a user executes a search query, and then quickly selects a particular file to<br />
be opened while the results set is still ‘stabilizing’, system resp<strong>on</strong>se time to<br />
locate and open the chosen file slows slightly, from roughly 2-4 sec<strong>on</strong>ds to 4-<br />
6 sec<strong>on</strong>ds elapsed time. It can also be difficult to select a file accurately while<br />
the results set stabilizes, because file locati<strong>on</strong>s are c<strong>on</strong>tinually shifting within
<strong>Smith</strong> 12<br />
the results list. This, however, can be c<strong>on</strong>sidered a user interface issue, rather<br />
than a system performance issue.<br />
<br />
When first c<strong>on</strong>nected to an external drive, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> instantly begins indexing<br />
the drive’s c<strong>on</strong>tents. A user can c<strong>on</strong>duct search queries while indexing is<br />
taking place, but resp<strong>on</strong>se times are significantly slowed down, to between 5-<br />
8 sec<strong>on</strong>ds. Additi<strong>on</strong>ally, <strong>on</strong>e cannot c<strong>on</strong>sider the results set to be complete<br />
until indexing of the underlying file system is finished. As an experiment, a<br />
250GB external drive c<strong>on</strong>taining 38GB of data was c<strong>on</strong>nected to the primary<br />
evaluati<strong>on</strong> computer. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> began indexing the drive at 11:35:40AM, and<br />
finished at 12:33:11PM, an elapsed time of 57 minutes, 31 sec<strong>on</strong>ds (an<br />
average indexing speed of 1 minute, 31 sec<strong>on</strong>ds per GB). During this time,<br />
all executed <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> queries – even those executed <strong>on</strong> the primary<br />
computer’s hard drive – were visible slowed by an additi<strong>on</strong>al 0-5 sec<strong>on</strong>ds per<br />
query.<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> results set – resp<strong>on</strong>se time slows markedly<br />
during the indexing of newly introduced drives.
<strong>Smith</strong> 13<br />
2b. Indexing architecture: The <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexing process is nicely illustrated by<br />
an Apple graphic (Apple Computer, 2005c, p.11):<br />
Whenever a document is created or modified, or whenever a new drive is introduced,<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s search engine initiates a query of the underlying file system, to determine<br />
the type of file(s) involved. Once ascertained, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> then calls up<strong>on</strong> the<br />
appropriate plug-in to import c<strong>on</strong>tent and metadata informati<strong>on</strong>. Every type of<br />
file has an associated plug-in; many plug-ins come built-in with the Tiger operating<br />
system, others can be created by developers to allow <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexing of less<br />
comm<strong>on</strong> file types. After parsing a file’s c<strong>on</strong>tents, the informati<strong>on</strong> is populated in the<br />
metadata index and the c<strong>on</strong>tent index, as appropriate. Collectively, these two<br />
indices are referred to as the ‘Apple Store’, and each drive maintains separate stores.<br />
Users can then use the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> search interface (or an independently developed<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> API) to search the Apple Store, and c<strong>on</strong>nect to the appropriate<br />
applicati<strong>on</strong>, when a file of interest is located.<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s indexing and informati<strong>on</strong> retrieval functi<strong>on</strong>s are tightly integrated with<br />
the Mac operating system. Both Apple literature and third-party reviews tout this as a<br />
significant advantage over add-<strong>on</strong> search tools, such as X1 for Windows. Certainly,<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s automated indexing and always-available search field are c<strong>on</strong>venient<br />
features for users. Without a direct performance comparis<strong>on</strong> against third-party<br />
products, however, it’s difficult to assess the merit of such praise.<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s c<strong>on</strong>tent index is generated using Apple’s proprietary Search Kit<br />
technology. The following statements in Apple’s developer literature (2004) describe<br />
the use of an inverted file mechanism in Search Kit, with addressing granularity at<br />
the document level <strong>on</strong>ly:<br />
<br />
<br />
Inverted file mechanism: “A Search Kit inverted index lists each<br />
c<strong>on</strong>stituent term exactly <strong>on</strong>ce, no matter how many of its c<strong>on</strong>tained<br />
documents include the term and no matter how frequently the term appears<br />
in any of the documents. In other words, the index tracks which documents<br />
use the term, and how often, but the term appears in the index just <strong>on</strong>ce.”<br />
Addressing granularity at the document level: “To Search Kit, a<br />
document is atomic in that it defines the granularity of a search. Using Search
<strong>Smith</strong> 14<br />
Kit, your applicati<strong>on</strong> can find documents—as your applicati<strong>on</strong> understands<br />
them—but cannot locate the positi<strong>on</strong> of a term within a document.”<br />
Although two other indexing methods are available to Search Kit developers (a<br />
“vector index” and an “inverted vector index”), <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> is most likely using the<br />
inverted index opti<strong>on</strong>, for the following reas<strong>on</strong>s:<br />
Apple (2004) characterizes the inverted index structure as being “faster and<br />
smaller” than the two other Search Kit indexing methods, features essential<br />
to <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>.<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> can identify matching files, but cannot identify the locati<strong>on</strong> within a<br />
file where specified text appears.<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> provides keyword-based searching (as opposed to similarity<br />
searching). Baeza-Yates, & Ribeiro-Neto cites inverted file mechanisms as<br />
“currently the best choice for most [keyword-based search] applicati<strong>on</strong>s”<br />
(1999, p.191), and indeed, Apple (2004) recommends the Search Kit inverted<br />
index opti<strong>on</strong> as the best opti<strong>on</strong> for keyword-based systems.<br />
3. Retrieval Performance<br />
Precisi<strong>on</strong> and recall are the two classic measurements of a system’s informati<strong>on</strong> retrieval<br />
performance. Recall measures a system’s ability to retrieve all (known) relevant documents,<br />
while precisi<strong>on</strong> measures the percentage of relevant documents in a particular results set. A<br />
third measure, the harm<strong>on</strong>ic mean, combines precisi<strong>on</strong> and recall measurements into a single<br />
performance metric.<br />
Although an inevitably subjective process, neither precisi<strong>on</strong> nor recall can be tested without<br />
first identifying the subset of relevant documents for a particular informati<strong>on</strong> need. For this<br />
reas<strong>on</strong>, a retrieval task was designed in advance, and all pertinent documents within the 144-<br />
document collecti<strong>on</strong> were identified. User 1 assessed document relevancy, as she was<br />
familiar with the data set, and the task was then presented to User 2, who possessed no prior<br />
c<strong>on</strong>tact with the data set. User 2 c<strong>on</strong>ducted 3 search queries, with the following results:<br />
Relevant<br />
Documents<br />
Task 1: Locate all documents menti<strong>on</strong>ing the state of Virginia<br />
Of the 144 text documents in the full data set, 32 are deemed pertinent to the<br />
informati<strong>on</strong> need:<br />
005; 006; 007; 008; 009; 010; 011; 012; 013; 014; 015; 016; 017; 018; 020a;<br />
020b; 020c; 022; 028a; 029; 030; 031; 032; 050; 051; 052; 063a; 065; 072d; 078;<br />
080k; 081;
<strong>Smith</strong> 15<br />
Query #1<br />
Search String Virginia<br />
Results Set<br />
This query returned 17 documents, 13 of which were<br />
relevant.<br />
Precisi<strong>on</strong> 1 13/17, or 76.47% of all retrieved documents are relevant to<br />
the query. 1<br />
Recall<br />
17/32, or 53.13% of all relevant documents were retrieved<br />
by the query.<br />
2_______<br />
1 + 1<br />
.5313 .7647<br />
Harm<strong>on</strong>ic<br />
Mean<br />
= 2_______<br />
3.19<br />
Observati<strong>on</strong>s<br />
= 0.627<br />
All 4 n<strong>on</strong>-relevant documents were captured because the<br />
author’s first name was ‘Virginia’, a polysemic issue.<br />
Precisi<strong>on</strong> is high, but almost half of all relevant documents<br />
were not returned – these all referred to the state of<br />
Virginia as ‘VA” (or ‘Va’, in <strong>on</strong>e case).<br />
Query #2<br />
Search String VA<br />
Results Set<br />
This search returned 37 documents, 25 of which were<br />
relevant.<br />
Precisi<strong>on</strong> 1 25/37, or 67.56% of all retrieved documents are relevant to<br />
the query.<br />
Recall<br />
25/32, or 78.13%% of all relevant documents were<br />
retrieved by the query.<br />
2_______<br />
1 + 1<br />
.7813 .6756<br />
Harm<strong>on</strong>ic<br />
Mean<br />
= 2_______<br />
2.76<br />
Observati<strong>on</strong>s<br />
= 0.725<br />
Recall was quite high, primarily because the majority of<br />
relevant documents referred to the state as ‘VA’ in the<br />
subject line. A single author posted the majority of the<br />
messages. A larger data set, reflecting the posting styles of<br />
many different people, may not have yielded results as<br />
favorable.
<strong>Smith</strong> 16<br />
Query #3<br />
Search String Virginia or VA<br />
Results Set<br />
This search returned 5 documents, 4 of which were<br />
relevant.<br />
Precisi<strong>on</strong> 1 4/5, or 80% of all retrieved documents are relevant to the<br />
query.<br />
Recall<br />
4/32, or 12.5% of all relevant documents were retrieved by<br />
the query.<br />
2_______<br />
1 + 1<br />
.125 .8<br />
Harm<strong>on</strong>ic<br />
Mean<br />
= 2_______<br />
9.25<br />
Observati<strong>on</strong>s<br />
= 0.216<br />
Recall was extremely low, because the search string was not<br />
interpreted by the system as the user anticipated. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />
does not recognize “or” as a valid Boolean operator. Those<br />
documents that were retrieved happened to refer to the<br />
state of Virginia as both ‘Virginia’ and ‘VA’, and included at<br />
least <strong>on</strong>e word c<strong>on</strong>taining ‘or’ as a letter sequence (i.e.,<br />
“memorial”; “born”; “Dora”). Precisi<strong>on</strong> was high, but as<br />
the search string was not interpreted as expected by the<br />
user, it cannot be attributed to a well-formulated query.<br />
1<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> does not employ a ranking algorithm, and returns documents in lexicographic order by document<br />
name. For this reas<strong>on</strong>, precisi<strong>on</strong> cannot be presented at intermediate levels of recall<br />
Additi<strong>on</strong>al observati<strong>on</strong>s:<br />
<br />
<br />
<br />
<br />
As can be seen in the histograms <strong>on</strong> the following page, precisi<strong>on</strong> and recall display<br />
an inverse relati<strong>on</strong>ship for all three queries.<br />
Query 3 has the highest precisi<strong>on</strong>, but also the lowest recall, whereas query 2 has the<br />
lowest precisi<strong>on</strong>, but the highest recall of the three queries.<br />
Harm<strong>on</strong>ic mean is poorest for query 3, reflecting the large difference between<br />
precisi<strong>on</strong> and recall measures.<br />
The data set is fairly small (144 documents), and this analysis may not properly<br />
reflect the poor recall performance associated with searching large document<br />
collecti<strong>on</strong>s (Blair & Mar<strong>on</strong>, 1985).
<strong>Smith</strong> 17
<strong>Smith</strong> 18<br />
SYSTEM REVIEW<br />
Apple’s <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> applicati<strong>on</strong> is tightly integrated with the Mac operating system. As such, its<br />
indexing and retrieval operati<strong>on</strong>s are closely guarded proprietary processes, discussed <strong>on</strong>ly in<br />
the broadest terms in corporate literature. Despite these restricti<strong>on</strong>s, the above performance<br />
evaluati<strong>on</strong> provides a good understanding of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s underlying informati<strong>on</strong> retrieval<br />
model. What follows is a broad review of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s informati<strong>on</strong> retrieval features, as<br />
deduced from the evaluati<strong>on</strong> process. Search and retrieval issues that became apparent<br />
during the course of the evaluati<strong>on</strong> are also noted.<br />
IR Model<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s underlying informati<strong>on</strong> retrieval model appears to be a classic Boolean model<br />
with binary weighting; that is, a document is either relevant (included in results set) or n<strong>on</strong>relevant<br />
(excluded from results set). This is evidenced by the fact that results are presented in<br />
lexicographic order by file name, with no ranking algorithm used.<br />
Text operati<strong>on</strong>s<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s logical view of documents is full-text, and includes informati<strong>on</strong> relating to<br />
syntactic structure. Experimentati<strong>on</strong> does not suggest the existence of any text normalizati<strong>on</strong><br />
procedures. Specifically, there appears to be:<br />
<br />
<br />
<br />
No lexical analysis. For example, a <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> search <strong>on</strong> “full-text” will locate this<br />
paper; but it is excluded from the results list if the hyphen is excluded in the search<br />
query.<br />
No stopword removal. A search of the word ‘the’, for example yields 12,040 matches<br />
<strong>on</strong> the tested hard drive. This figure includes 1,1155 rich and plain text documents,<br />
and 443 PDF documents.<br />
No stemming. A search <strong>on</strong> the word ‘enter’ will return this paper in the results set,<br />
but fails to do so if the same word is searched with an added suffix of ‘-s’ or ‘-ing’.<br />
Text languages<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexes a broad range of text languages, including both plain and rich text formats,<br />
PDF documents, markup languages, and metadata for many comm<strong>on</strong> file formats.<br />
Additi<strong>on</strong>ally, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> can index multimedia files, system f<strong>on</strong>ts and scripts, and applicati<strong>on</strong>specific<br />
text such as e-mail messages, address book entries, etc. Additi<strong>on</strong>ally, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> plugins<br />
permit developers to expand <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s indexing coverage to handle less comm<strong>on</strong> or<br />
newly developed text languages and file formats.
<strong>Smith</strong> 19<br />
Query language and operati<strong>on</strong>s<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s query functi<strong>on</strong>s are resp<strong>on</strong>sible for many of the system’s limitati<strong>on</strong>s. It permits<br />
basic single-word or multiple-word queries <strong>on</strong> both text c<strong>on</strong>tent and syntax (metadata), and<br />
its internal divisi<strong>on</strong> of words into letters allows matching <strong>on</strong> partial words.<br />
Without an understanding of Unix command-line operati<strong>on</strong>s, however, users are unable to<br />
execute even the simplest of Boolean queries, and cannot specify c<strong>on</strong>text queries such as<br />
phrase or proximity searches. Apple either c<strong>on</strong>siders such querying to be bey<strong>on</strong>d the<br />
understanding of most users, or plans to release expanded search functi<strong>on</strong>ality in subsequent<br />
operating system releases. Regardless of the reas<strong>on</strong>, sophisticated querying is currently <strong>on</strong>ly<br />
available to ‘power users’.<br />
These query limitati<strong>on</strong>s are particularly troublesome because:<br />
1. They are not transparent to the user. Internet surfers who use basic Boolean<br />
operati<strong>on</strong>s such as AND, OR, and “phrase queries” may attempt to execute Boolean<br />
operati<strong>on</strong>s, and fail to correctly evaluate the results set, as dem<strong>on</strong>strated by User 2<br />
during the retrieval evaluati<strong>on</strong> (see Query 3). The <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> interface provides the<br />
user with no visual affordances as to proper (or improper) query formulati<strong>on</strong>.<br />
2. Language is a rich communicati<strong>on</strong> medium, offering a seemingly boundless diversity<br />
of ways in which c<strong>on</strong>cepts can be expressed. As a full-text indexing system with no<br />
apparent text normalizati<strong>on</strong>, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s informati<strong>on</strong> retrieval performance is<br />
particularly susceptible to the problems of syn<strong>on</strong>ymy and polysemy, and thus<br />
particularly in need of sophisticated query operati<strong>on</strong>s. The “Virginia” retrieval task<br />
(Query 1) used in this paper’s retrieval performance evaluati<strong>on</strong> dem<strong>on</strong>strates this<br />
aptly.<br />
User interface, retrieval issues<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> offers a ‘Smart Folder’ feature, permitting structural queries via a series of<br />
dropdown menu choices that can be saved for future use. This interface was found to be<br />
stilted, limiting in opti<strong>on</strong>s, and failing in its support of essential query operati<strong>on</strong>s such as<br />
phrase searching, Boolean OR statements, etc.<br />
Smart Folders are envisi<strong>on</strong>ed as a means for users to access their files without regard to their<br />
physical storage locati<strong>on</strong>. Once a Smart Folder is defined and saved, it updates itself<br />
automatically, providing the user with an up-to-date list of all files meeting their specified<br />
criteria. Instead of browsing multiple times through layers of nested folders, users can<br />
potentially access all relevant informati<strong>on</strong> via a single, ‘virtual folder’. The Wall Street Journal<br />
rightly points out the potential for this model to change users’ primary mode of informati<strong>on</strong><br />
retrieval:<br />
“This is a big deal…<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> could spark a major change in the way people use<br />
computers. Instead of hunting for documents or clicking <strong>on</strong> programs, people may<br />
now start activities by searching for relevant files and then opening them as needed”<br />
(Mossberg, 2005).
<strong>Smith</strong> 20<br />
In its current form, however, the Smart Folder feature has <strong>on</strong>e significant drawback, as<br />
revealed during the system performance evaluati<strong>on</strong>. Users will still sometimes want to focus<br />
their query <strong>on</strong> a single folder. The Smart Folder feature permits this; unfortunately, the<br />
search will be executed <strong>on</strong>ly <strong>on</strong>e layer deep within the specified folder; the c<strong>on</strong>tents of any<br />
nested folders are not c<strong>on</strong>sidered by the query.<br />
<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s Smart Folder feature.<br />
Again, <strong>on</strong>ly experience and experimentati<strong>on</strong> will reveal this limitati<strong>on</strong> to the user; the<br />
interface provides no guidance, no manual is provided, and Apple’s Help applicati<strong>on</strong> is silent<br />
<strong>on</strong> the issue.<br />
CONCLUSION<br />
Multiple published reviews of the new Mac OS X 10.4 operating system (‘Tiger’) were<br />
c<strong>on</strong>sulted for this system evaluati<strong>on</strong>. These reviews are uniformly positive in their<br />
assessment of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>, and so the project was approached with c<strong>on</strong>siderable optimism.<br />
Having encountered many serious issues with <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s retrieval model, I am now<br />
somewhat apt to believe these reviews were primarily derived from promoti<strong>on</strong>al materials<br />
supplied by Apple, and involved scant independent analysis.<br />
Put succinctly, the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexing architecture is impressive; the indexing process is<br />
transparent, automatic, and tightly integrated with the operating system. Without the<br />
compani<strong>on</strong>ship of an effective informati<strong>on</strong> retrieval model and search interface, however,<br />
the full power and utility of the index remains inaccessible to the average user.
<strong>Smith</strong> 21<br />
BIBLIOGRAPHY<br />
Apple Computer, Inc. (2005a). <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>. Find anything, anywhere, fast. Retrieved August 2,<br />
2005 from http://www.apple.com/macosx/features/spotlight/.<br />
Apple Computer, Inc. (2005b). Tiger developer overview series: Working with <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>.<br />
Retrieved August 2, 2005 from http://developer.apple.com/macosx/spotlight.html.<br />
Apple Computer, Inc. (2005c). Technology brief: Mac OS X <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>. Find anything <strong>on</strong><br />
your Mac instantly. Retrieved August 2, 2005 from<br />
http://images.apple.com/macosx/pdf/MacOSX_<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>_TB.pdf.<br />
Apple Computer, Inc. (2004). Developer C<strong>on</strong>necti<strong>on</strong>. How Search Kit Works. Retrieved<br />
August 8, 2005 from<br />
http://developer.apple.com/documentati<strong>on</strong>/UserExperience/C<strong>on</strong>ceptual/SearchKi<br />
tC<strong>on</strong>cepts/searchKit_c<strong>on</strong>cepts/chapter_3_secti<strong>on</strong>_5.html<br />
Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Modern informati<strong>on</strong> retrieval. New York:<br />
ACM Press.<br />
Beagrie, N. (June, 2005). Plenty of room at the bottom Pers<strong>on</strong>al digital libraries and<br />
collecti<strong>on</strong>s. D-Lib Magazine, 11(6). Viewed August 5, 2006 at<br />
http://www.dlib.org/dlib/june05/beagrie/06beagrie.html.<br />
Blair, D.C., and Mar<strong>on</strong>, M.E. (March, 1985). An evaluati<strong>on</strong> of retrieval effectiveness for a<br />
full-text document-retrieval system. Communicati<strong>on</strong>s of the ACM, (28)3: 289-299.<br />
Coffee, P. (May 30, 2005). ‘Tiger’ invites developers in. eWeek, 22(22): 46.<br />
Lewis, P. (May 16, 2005). Tiger tale: Look before you leap. Fortune, 151(10): 200, 202.<br />
McElhearn, K. (August, 2005). Command <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>. Macworld, 22(8): 88-89.<br />
Michaels, M. (September, 2004). 10 things to know about Tiger. Macworld, 21(9): 50-55.<br />
Mossberg, W.S. (April 28, 2005). Tiger leaps out in fr<strong>on</strong>t; Apple operating system offers new<br />
approach to searching, Smart Folders, better browser. Wall Street Journal (Eastern<br />
Editi<strong>on</strong>), p. B1. Retrieved August 2, 2005 from ProQuest database.<br />
Pogue, D. (April 28, 2005). Apple’s Tiger may even have PC owners l<strong>on</strong>ging for a Mac to<br />
put it in. The New York Times, pp. C1, C10. Retrieved August 2, 2005 from Lexis<br />
Nexis Academic database.<br />
Wildstrom, S.H. (May 9, 2005). Tiger makes Mac’s edge even sharper. Business Week, 3932:<br />
28.