01.01.2015 Views

Spotlight on Spotlight - Carol Smith Home Page

Spotlight on Spotlight - Carol Smith Home Page

Spotlight on Spotlight - Carol Smith Home Page

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />

<strong>on</strong> <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />

An evaluati<strong>on</strong> and review of Mac OS X<br />

tiger’s desktop indexing applicati<strong>on</strong><br />

<strong>Carol</strong> <strong>Smith</strong><br />

Info 624 – informati<strong>on</strong> retrieval systems<br />

Summer 2005, Buzydlowski<br />

Submitted august 17, 2005


<strong>Smith</strong> 2<br />

TABLE OF CONTENTS<br />

ABSTRACT 3<br />

AUTHOR KEYWORDS 3<br />

INTRODUCTION 3<br />

Problem Domain and Scope 3<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> Features – Brief Overview 4<br />

DATA SET 6<br />

Data Set Proposal 6<br />

Data Set Descripti<strong>on</strong> 6<br />

Sample Document 7<br />

Data Set Creati<strong>on</strong> 7<br />

Data Set Issues 8<br />

EVALUATION 9<br />

Methodology 9<br />

1. Functi<strong>on</strong>al Analysis 10<br />

2. System Performance Evaluati<strong>on</strong> 11<br />

3. Retrieval Performance 14<br />

SYSTEM REVIEW 18<br />

IR Model 18<br />

Text operati<strong>on</strong>s 18<br />

Text languages 18<br />

Query language and operati<strong>on</strong>s 19<br />

User interface, retrieval issues 19<br />

CONCLUSION 20<br />

BIBLIOGRAPHY 21


<strong>Smith</strong> 3<br />

ABSTRACT<br />

As the number of text and multimedia files stored by the average computer user c<strong>on</strong>tinues to<br />

increase, so will the need to effectively index and access <strong>on</strong>e's 'pers<strong>on</strong>al digital library'. Apple<br />

Computer, Inc.’s latest operating system, Mac OS X 10.4 (dubbed ‘Tiger’), includes an<br />

integrated indexing and retrieval applicati<strong>on</strong> known as ‘<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’. This paper analyzes<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s capabilities and limitati<strong>on</strong>s by first defining a data set of textual documents, and<br />

then testing <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s performance in indexing and accessing the data set. The basic<br />

features of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> are introduced, and then the test data set is described, including issues<br />

related to its creati<strong>on</strong> and use. A three-pr<strong>on</strong>ged approach was adopted to assess the<br />

performance of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>. During the functi<strong>on</strong>al analysis phase, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> was tested for<br />

system errors. A performance analysis then assessed the speed and storage requirements of<br />

the indexing system. Finally, a retrieval performance evaluati<strong>on</strong> tested <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> against the<br />

data set for precisi<strong>on</strong>, recall and harm<strong>on</strong>ic mean measurements. The paper c<strong>on</strong>cludes with<br />

observati<strong>on</strong>s about <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s underlying informati<strong>on</strong> retrieval model, in terms of text<br />

languages and operati<strong>on</strong>s, query languages and operati<strong>on</strong>s, and interface issues. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> is<br />

found to be a utility with great promise, but also with significant challenges.<br />

AUTHOR KEYWORDS<br />

Apple; Mac OS X 10.4; Tiger; <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>; informati<strong>on</strong> retrieval; operating systems; evaluati<strong>on</strong>.<br />

INTRODUCTION<br />

Problem Domain and Scope<br />

As the number of text and multimedia files stored by the average computer user c<strong>on</strong>tinues to<br />

increase, so will the need to effectively index and access <strong>on</strong>e's 'pers<strong>on</strong>al digital library'. Both<br />

Microsoft and Apple Computer have recognized this growing need to manage digital<br />

collecti<strong>on</strong>s, and have been racing to integrate informati<strong>on</strong> retrieval utilities into their<br />

competing operating systems. Microsoft’s Window Vista operating system (formerly known<br />

as ‘L<strong>on</strong>ghorn’) is slated for release sometime in 2006, and is expected to include integrated<br />

indexing/query capabilities. Apple Computer, Inc. (hereafter, ‘Apple’), however, ‘beat them<br />

to the punch’, releasing Mac OS X 10.4 (dubbed ‘Tiger’) to the public <strong>on</strong> April 29, 2005.<br />

Included in Mac OS X Tiger is an integrated indexing and retrieval applicati<strong>on</strong> known as<br />

‘<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’.<br />

Mac users of greatly varying technical ability are already actively using <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> to seek and<br />

retrieve informati<strong>on</strong> from their desktop computing envir<strong>on</strong>ments; indeed al<strong>on</strong>g with their<br />

favorite Internet browser and search engine, it is likely to become the informati<strong>on</strong> retrieval<br />

system they access most often. As a developer's tool, the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> search engine will also be<br />

incorporated into dozens of third-party applicati<strong>on</strong>s. Given its potential for wide use, a<br />

thorough analysis of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s informati<strong>on</strong> retrieval capabilities and limitati<strong>on</strong>s is<br />

warranted. Published reviews of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> are glowing with praise, but unfortunately provide<br />

little analytical detail. This paper seeks to fill that void by systematically evaluating <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s


<strong>Smith</strong> 4<br />

performance. To accomplish this, a data set of textual documents is first created and defined,<br />

and then used to test <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s performance in indexing and accessing the data set.<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> Features – Brief Overview<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexes the c<strong>on</strong>tents of a drive automatically, with no explicit acti<strong>on</strong> required by<br />

the user. For textual documents, the full text is indexed, and for all file types, applicati<strong>on</strong><br />

metadata is also indexed. The <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> engine works closely with the Mac operating system,<br />

updating the index anytime a new file is created or modified. Additi<strong>on</strong>al detail about<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s indexing architecture and processes is provided in the Indexing Processes and<br />

System Review secti<strong>on</strong>s of this paper.<br />

Because the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexing/retrieval engine is an integrated comp<strong>on</strong>ent of the Mac<br />

operating system, it is ‘always <strong>on</strong>’ and doesn’t need to be launched in the manner of a<br />

traditi<strong>on</strong>al applicati<strong>on</strong>. The <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> query window is always available within a few<br />

keystrokes, and can be accessed in a number of alternate ways:<br />

1. The upper right-hand window of the Mac interface c<strong>on</strong>tains a permanent ‘spyglass’<br />

ic<strong>on</strong> that is always within view– clicking <strong>on</strong> this ic<strong>on</strong> brings up the basic <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />

query window:<br />

2. The same query window can alternately be reached via a command-space bar<br />

keystroke combinati<strong>on</strong>.<br />

3. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> query windows are built into popular Apple applicati<strong>on</strong>s and utilities,<br />

including Mail, Preferences, Address Book, Calendar and others.<br />

4. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> queries can also be executed within the Apple Finder window. This final<br />

access method also permits the creati<strong>on</strong> and saving of customized queries called<br />

‘Smart Folders.’ Apple envisi<strong>on</strong>s these virtual folders as a new method for organizing<br />

and managing informati<strong>on</strong>, <strong>on</strong>e that may at least partially supplant traditi<strong>on</strong>al<br />

physical file organizati<strong>on</strong>.


<strong>Smith</strong> 5<br />

As keywords are entered into the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> query window, matching documents <strong>on</strong> the drive<br />

are listed, ordered first by general file type, then lexicographically by file name. Clicking <strong>on</strong><br />

any file name will open the document within its associated applicati<strong>on</strong>:<br />

A user can also hit the Return key, in order to open the results set in a separate window. This<br />

window permits additi<strong>on</strong>al manipulati<strong>on</strong> of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> query results, including alternate<br />

ordering opti<strong>on</strong>s and additi<strong>on</strong>al filtering functi<strong>on</strong>s:


<strong>Smith</strong> 6<br />

DATA SET<br />

Data Set Proposal<br />

In keeping with <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s anticipated use as an indexing and retrieval system for individual<br />

digital collecti<strong>on</strong>s, a genealogical data set of pers<strong>on</strong>al interest and utility to the author was<br />

envisi<strong>on</strong>ed.<br />

Genealogists spend a significant amount of time reviewing Internet message boards,<br />

particularly those dedicated to family surname research. These message board services offer<br />

sophisticated query capabilities, including features such as field searching, Soundex<br />

searching, and impositi<strong>on</strong> of date range limits. Despite such useful informati<strong>on</strong> retrieval<br />

features, however, searching the message boards is still a time-c<strong>on</strong>suming affair. Because of<br />

historical name spelling variants, a message of interest might be posted <strong>on</strong> any of multiple<br />

surname message boards. When researching the Minnick family, for example, a query must<br />

be individually executed <strong>on</strong> as many as 12 different message boards (Minnick; Minick;<br />

Minck; etc…) located <strong>on</strong> multiple servers, in order to c<strong>on</strong>duct a comprehensive search.<br />

There is currently no way to query multiple Internet message boards simultaneously, even<br />

within a single web site.<br />

By creating a unified data set of postings from multiple Internet message boards, a<br />

genealogist should be able to utilize <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> to rapidly execute comprehensive searches via<br />

a single query. Creati<strong>on</strong> of the initial data set will requires a significant investment of time up<br />

fr<strong>on</strong>t, but should be rewarded by faster search and retrieval for subsequent informati<strong>on</strong><br />

needs. The created data set, in combinati<strong>on</strong> with <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>, will essentially enable ‘metasearch’<br />

capabilities across multiple message boards.<br />

Data Set Descripti<strong>on</strong><br />

The test data set was drawn from six separate surname message boards, all hosted by<br />

Ancestry.com (http://ancestry.com/share/):<br />

Minnick Surname Board<br />

Minick Surname Board<br />

Mink Surname Board<br />

Minnich Surname Board<br />

Minich Surname Board<br />

Minck Surname Board<br />

Additi<strong>on</strong>al message boards are located at http://genforum.genealogy.com and numerous<br />

other genealogical web sites; these would be included in any fully implemented project, but<br />

were deemed n<strong>on</strong>essential for the paper’s primary purpose of evaluating <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s<br />

indexing/retrieval performance.<br />

To keep the project manageable, the data set was limited to discussi<strong>on</strong> threads with an initial<br />

posting dated 1/1/2004 or later. Any messages dated 1/1/2004 or later but related to a


<strong>Smith</strong> 7<br />

thread initiated prior to 2004 were not c<strong>on</strong>sidered for inclusi<strong>on</strong>. These date limits resulted in<br />

a data set of 144 plain text files, as well as 22 JPEG images that were attached to the original<br />

messages.<br />

The data set displays several interesting characteristics that may present informati<strong>on</strong> retrieval<br />

challenges, including:<br />

A high incidence of recurring terms (first names, dates, etc.)<br />

Numerous variant expressi<strong>on</strong>s, including abbreviati<strong>on</strong>s and unintenti<strong>on</strong>al misspellings<br />

(e.g., Mississippi; Miss.; Missisippi; MS)<br />

Polysemy; that is, words with multiple possible meanings (e.g., Virginia as a place;<br />

Virginia as a female name)<br />

Sample Document<br />

The below image is representative of a typical message board posting, in its original HTML<br />

formatting.<br />

Boards > Surnames > Minck<br />

URL: http://boards.ancestry.com/mbexec/message/an/surnames.minck/9<br />

Data Set Creati<strong>on</strong><br />

Creati<strong>on</strong> of the data set was a predictably tedious, manual affair. For each of the 144<br />

individual message board postings, the following steps were followed, in sequence:


<strong>Smith</strong> 8<br />

1. The target html page was opened within a web browser.<br />

2. Because a straight copy/paste routine would have captured undesirable informati<strong>on</strong><br />

and hyperlinks related extraneous to the message, each page was then reloaded via<br />

the page’s ‘Printer-friendly’ hyperlink.<br />

3. The message was copied in its entirety using cmd-a/cmd-c keyboard shortcuts<br />

(Macintosh).<br />

4. Using the cmd-v keyboard shortcut (Macintosh), the message was then pasted into a<br />

new plain text document, using Apple’s TextEdit applicati<strong>on</strong>.<br />

5. Two correcti<strong>on</strong>s were made to each plain text document:<br />

a. The phrase “Return to Message” (a hyperlink in the original page) was<br />

deleted from the end of each document.<br />

b. In order to avoid web crawler agents, the original html pages provide e-mail<br />

addresses in .gif format. For this reas<strong>on</strong>, each author’s e-mail address<br />

informati<strong>on</strong> needed to be entered manually.<br />

6. Each plain text file was then saved to the hard drive.<br />

7. A small percentage of message board postings were accompanied by .jpg<br />

attachments, typically scanned documents relating to the message. Each of these<br />

attachments (22 in all) was saved as separate data set files. Each attachment had to<br />

first be loaded into a separate browser window, for some unknown reas<strong>on</strong>,<br />

attachments could <strong>on</strong>ly be saved as .gif images without this extra step, even though<br />

the extensi<strong>on</strong> of the attachment indicated it was a .jpg file.<br />

After some c<strong>on</strong>siderati<strong>on</strong>, it was decided to name each text file sequentially, beginning with<br />

001, 002, 003, etc. If an initial message board posting received replies, each posting of a<br />

single thread were given the same number, but distinguished with sequential letters; e.g.,<br />

001a, 001b, 001c, etc… Some thought was given as to whether file names should indicate<br />

the level of depth in a particular thread; that is, if a posting was the sec<strong>on</strong>d reply to a reply of<br />

an initial posting, label it 001aab. This level of complexity was deemed unnecessary,<br />

however, as any thread in questi<strong>on</strong> could be easily located in its original web locati<strong>on</strong>, should<br />

the sequence of postings become of interest.<br />

Data Set Issues<br />

As described in the Functi<strong>on</strong>al Analysis secti<strong>on</strong> below, two decisi<strong>on</strong>s made during the<br />

creati<strong>on</strong> of the initial data set proved problematic, and required further data set modificati<strong>on</strong>:<br />

1. Because Mac files do not require extensi<strong>on</strong>s (.txt, .doc, etc.), extensi<strong>on</strong>s were not<br />

initially entered during the file-naming step.<br />

2. Documents were initially saved to separate sub-folders for each of the Internet<br />

message boards (i.e., “Ancestry-Minnick”; “Ancestry-Minick”; “Ancestry-Minck”;<br />

“Ancestry-Minnich”; “Ancestry-Minich”; “Ancestry-Mink”). Attachments were<br />

further segregated into folders within these folders, labeled “Ancestry-Minnick-<br />

Images”, etc. Finally, all six subfolders were c<strong>on</strong>tained within a single top-level folder<br />

labeled “<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> Data Set.”<br />

It should also be noted that the fielded format of the documents in their original web format


<strong>Smith</strong> 9<br />

was an initial attracti<strong>on</strong>, as it offered up the possibility of testing <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s performance <strong>on</strong><br />

structural (syntactic) queries, as well as <strong>on</strong> semantic c<strong>on</strong>tent. The fielded structure of data is<br />

not retained, however, when informati<strong>on</strong> is c<strong>on</strong>verted to plain text format. This loss of<br />

structural integrity was not anticipated (author oversight), but is hopefully made up for by an<br />

extended <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> system review.<br />

EVALUATION<br />

Methodology<br />

Evaluati<strong>on</strong>s were c<strong>on</strong>ducted by two independent users (hereafter, “User 1” and “User 2”).<br />

One user was familiar with the c<strong>on</strong>tents of the data set, the other not. Neither user had prior<br />

experience with the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> interface, although they were both experienced users of<br />

Macintosh operating systems. All tasks were c<strong>on</strong>ducted in batch mode (vs. interactive) mode;<br />

that is, queries were executed and resp<strong>on</strong>ses evaluated <strong>on</strong> an individual basis, rather than as<br />

iterative sequences of query/resp<strong>on</strong>se/revised query. Evaluati<strong>on</strong>s were carried out in a home<br />

setting, but the structured nature of the tasks rendered the experiments closer to a laboratory<br />

sessi<strong>on</strong> than a real-life field assessment. All evaluati<strong>on</strong>s were c<strong>on</strong>ducted <strong>on</strong> a 1.07GHz Apple<br />

iBook G4 laptop computer with 512 MB of DDR SDRAM.<br />

A three-pr<strong>on</strong>ged approach was adopted to assess the performance of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>:<br />

1. Functi<strong>on</strong>al Analysis: During this errors analysis phase, users were asked to freely<br />

utilize <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> to access the data set. Defined retrieval tasks were not provided;<br />

instead, users created a range of their own ad hoc tasks intended to reveal functi<strong>on</strong>al<br />

problems or inc<strong>on</strong>sistencies in the system’s indexing and retrieval performance.<br />

Users were asked to determine whether their chosen tasks were properly supported<br />

by the system, and whether they were ultimately able to accomplish the tasks. Users<br />

were asked to verbalize their experiences and challenges as these tasks were executed.<br />

2. System Performance Evaluati<strong>on</strong>: As a proprietary system (and due to the author’s<br />

lack of technical prowess), obtaining precise calculati<strong>on</strong>s of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s resp<strong>on</strong>se time<br />

and storage requirements proved challenging. Resp<strong>on</strong>se time tests were executed<br />

using a manually c<strong>on</strong>trolled stopwatch; calculati<strong>on</strong>s should therefore be <strong>on</strong>ly<br />

c<strong>on</strong>sidered as estimates. Further, an academic discussi<strong>on</strong> of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s storage<br />

requirements was substituted for an actual evaluati<strong>on</strong>, as no means for calculating<br />

storage use could be determined by the author.<br />

3. Retrieval Performance Evaluati<strong>on</strong>: To assess the indexing and retrieval<br />

performance of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>, classic precisi<strong>on</strong> and recall measurements were calculated<br />

separately. Harm<strong>on</strong>ic mean, a measurement unifying precisi<strong>on</strong> and recall levels into a<br />

single metric, is also provided.<br />

Executi<strong>on</strong> and outcome of these three evaluati<strong>on</strong> modes are discussed in separate secti<strong>on</strong>s,<br />

below.


<strong>Smith</strong> 10<br />

1. Functi<strong>on</strong>al Analysis<br />

Users were asked to first browse the data set and examine individual documents, in order to<br />

devise a broad range of retrieval tasks. These tasks were then executed freely using <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>,<br />

in an effort to reveal functi<strong>on</strong>al challenges. Performance errors became apparent at a very<br />

early stage:<br />

Task #1 (User 1): Locate all documents authored by Verna Williams.<br />

Error: During the browsing sessi<strong>on</strong>, User 1 pre-determined that at least 5<br />

individual documents existed in the data set with author listed as either<br />

‘Verna’ or ‘Verna Williams’. When entering ‘Verna’, ‘Author: Verna’ or<br />

‘Author: Verna Williams’ in the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>, however, zero results were<br />

retrieved. Similar retrieval failures were c<strong>on</strong>sistently experienced with other<br />

devised user tasks.<br />

Problem Source: Several hours of explorati<strong>on</strong> and experimentati<strong>on</strong> revealed<br />

the reas<strong>on</strong> for this performance failure. All 144 plain text files in the data set<br />

were saved without the .txt extensi<strong>on</strong> added to the document name. Such file<br />

extensi<strong>on</strong>s are necessary in a Windows envir<strong>on</strong>ment, but are opti<strong>on</strong>al in Mac<br />

operating systems. In order for <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> to index a plain text file, however,<br />

the .txt file extensi<strong>on</strong> is apparently required. Although not necessarily a<br />

functi<strong>on</strong>al error, it is <strong>on</strong>e that c<strong>on</strong>flicts with l<strong>on</strong>g-established system<br />

behavior, and will inevitably cause c<strong>on</strong>fusi<strong>on</strong> for Mac users.<br />

Resoluti<strong>on</strong>: As so<strong>on</strong> as the extensi<strong>on</strong> was added to all text files in the data<br />

set, User 1 was able to successfully execute all remaining functi<strong>on</strong>al retrieval<br />

tasks.<br />

Task #2 (User 2): Locate all documents menti<strong>on</strong>ing the name “John”.<br />

Error: During the browsing sessi<strong>on</strong>, User 2 observed that John was a<br />

comm<strong>on</strong> name listed in the message board postings, and was curious about<br />

what percentage of documents c<strong>on</strong>tained this name. Even after the .txt<br />

extensi<strong>on</strong> had been added to all plain text files in the document set, however,<br />

User 2 was unable to execute the retrieval task, receiving a results set of zero<br />

documents.<br />

Problem Source: Again, explorati<strong>on</strong> and experimentati<strong>on</strong> led to an<br />

understanding of the performance failure. Whereas User 1 was utilizing<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> to c<strong>on</strong>duct a system-wide search of the computer’s entire hard<br />

drive, and then analyzing just those .txt files with the anticipated file name<br />

formatting (001, 002, etc.), User 2 chose to access <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> via the system’s<br />

Finder feature, specifying a focused search of just the Data Set folder. As<br />

previously described, this folder c<strong>on</strong>tains six subfolders, and documents were<br />

saved within those six subfolders. It was determined that when <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> is<br />

directed to examine the c<strong>on</strong>tents of a particular folder, it analyzes folder


<strong>Smith</strong> 11<br />

c<strong>on</strong>tents <strong>on</strong>ly <strong>on</strong>e level deep; that is, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> was seeking keyword matches<br />

<strong>on</strong> the folders themselves, but not <strong>on</strong> their c<strong>on</strong>tents.<br />

Resoluti<strong>on</strong>: Although this may not be a functi<strong>on</strong>al error, it is regarded as a<br />

serious limitati<strong>on</strong> in <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s performance, and <strong>on</strong>e that is likely to c<strong>on</strong>fuse<br />

many users, whose prior computing experience will lead them to expect all<br />

nested c<strong>on</strong>tents of a folder to be c<strong>on</strong>sidered during a targeted search query.<br />

To overcome the immediate problem, however, the situati<strong>on</strong> was resolved by<br />

removing all nested folder structures within the Data Set Folder. Once<br />

accomplished, User 2 was able to c<strong>on</strong>duct all remaining functi<strong>on</strong>al retrieval<br />

tasks (see Appendix A) without error.<br />

After this, no further functi<strong>on</strong>al errors were identified, and the functi<strong>on</strong>al evaluati<strong>on</strong> was<br />

c<strong>on</strong>cluded.<br />

2. System Performance Evaluati<strong>on</strong><br />

Performance evaluati<strong>on</strong>s assess the efficiency of a retrieval system’s architecture, in terms of<br />

its use of storage space, system interacti<strong>on</strong>s and resp<strong>on</strong>se time. Unfortunately, my technical<br />

abilities are limited, and Apple provides no built-in utilities for assessing <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />

performance; after some research I was unable to identify any feasible methods for<br />

generating precisi<strong>on</strong> performance metrics. In lieu of this, I elected instead to make rough<br />

resp<strong>on</strong>se time observati<strong>on</strong>s (using a manually c<strong>on</strong>trolled stopwatch), and to discuss<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s indexing and retrieval structures in general terms.<br />

2a. Resp<strong>on</strong>se time: Because <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> is tightly integrated with the Mac operating<br />

system, indexing of an entire file system is c<strong>on</strong>ducted as so<strong>on</strong> as a drive has been<br />

introduced, and automatically updated each time a new document is created or<br />

modified. This up-to-date system level index is readily available at any time for<br />

searching by the user, via an ic<strong>on</strong> in the upper-right hand corner of the interface. As<br />

<strong>on</strong>e begins typing a word, matching documents immediately begin filling the results<br />

window, arranged by media type. As the word c<strong>on</strong>tinues to be typed, n<strong>on</strong>-matching<br />

results are rapidly eliminated from the results window. Once the user has finished<br />

typing the search query, the final results screen takes anywhere from 3-5 sec<strong>on</strong>ds to<br />

stabilize. From a user perspective, then, initial resp<strong>on</strong>se times appear nearly<br />

immediate, with final results usually available within a 5-sec<strong>on</strong>d time frame.<br />

With ad hoc experimentati<strong>on</strong>, two general resp<strong>on</strong>se time challenges were observed<br />

(neither specifically associated with the defined data set):<br />

<br />

If a user executes a search query, and then quickly selects a particular file to<br />

be opened while the results set is still ‘stabilizing’, system resp<strong>on</strong>se time to<br />

locate and open the chosen file slows slightly, from roughly 2-4 sec<strong>on</strong>ds to 4-<br />

6 sec<strong>on</strong>ds elapsed time. It can also be difficult to select a file accurately while<br />

the results set stabilizes, because file locati<strong>on</strong>s are c<strong>on</strong>tinually shifting within


<strong>Smith</strong> 12<br />

the results list. This, however, can be c<strong>on</strong>sidered a user interface issue, rather<br />

than a system performance issue.<br />

<br />

When first c<strong>on</strong>nected to an external drive, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> instantly begins indexing<br />

the drive’s c<strong>on</strong>tents. A user can c<strong>on</strong>duct search queries while indexing is<br />

taking place, but resp<strong>on</strong>se times are significantly slowed down, to between 5-<br />

8 sec<strong>on</strong>ds. Additi<strong>on</strong>ally, <strong>on</strong>e cannot c<strong>on</strong>sider the results set to be complete<br />

until indexing of the underlying file system is finished. As an experiment, a<br />

250GB external drive c<strong>on</strong>taining 38GB of data was c<strong>on</strong>nected to the primary<br />

evaluati<strong>on</strong> computer. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> began indexing the drive at 11:35:40AM, and<br />

finished at 12:33:11PM, an elapsed time of 57 minutes, 31 sec<strong>on</strong>ds (an<br />

average indexing speed of 1 minute, 31 sec<strong>on</strong>ds per GB). During this time,<br />

all executed <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> queries – even those executed <strong>on</strong> the primary<br />

computer’s hard drive – were visible slowed by an additi<strong>on</strong>al 0-5 sec<strong>on</strong>ds per<br />

query.<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> results set – resp<strong>on</strong>se time slows markedly<br />

during the indexing of newly introduced drives.


<strong>Smith</strong> 13<br />

2b. Indexing architecture: The <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexing process is nicely illustrated by<br />

an Apple graphic (Apple Computer, 2005c, p.11):<br />

Whenever a document is created or modified, or whenever a new drive is introduced,<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s search engine initiates a query of the underlying file system, to determine<br />

the type of file(s) involved. Once ascertained, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> then calls up<strong>on</strong> the<br />

appropriate plug-in to import c<strong>on</strong>tent and metadata informati<strong>on</strong>. Every type of<br />

file has an associated plug-in; many plug-ins come built-in with the Tiger operating<br />

system, others can be created by developers to allow <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexing of less<br />

comm<strong>on</strong> file types. After parsing a file’s c<strong>on</strong>tents, the informati<strong>on</strong> is populated in the<br />

metadata index and the c<strong>on</strong>tent index, as appropriate. Collectively, these two<br />

indices are referred to as the ‘Apple Store’, and each drive maintains separate stores.<br />

Users can then use the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> search interface (or an independently developed<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> API) to search the Apple Store, and c<strong>on</strong>nect to the appropriate<br />

applicati<strong>on</strong>, when a file of interest is located.<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s indexing and informati<strong>on</strong> retrieval functi<strong>on</strong>s are tightly integrated with<br />

the Mac operating system. Both Apple literature and third-party reviews tout this as a<br />

significant advantage over add-<strong>on</strong> search tools, such as X1 for Windows. Certainly,<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s automated indexing and always-available search field are c<strong>on</strong>venient<br />

features for users. Without a direct performance comparis<strong>on</strong> against third-party<br />

products, however, it’s difficult to assess the merit of such praise.<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s c<strong>on</strong>tent index is generated using Apple’s proprietary Search Kit<br />

technology. The following statements in Apple’s developer literature (2004) describe<br />

the use of an inverted file mechanism in Search Kit, with addressing granularity at<br />

the document level <strong>on</strong>ly:<br />

<br />

<br />

Inverted file mechanism: “A Search Kit inverted index lists each<br />

c<strong>on</strong>stituent term exactly <strong>on</strong>ce, no matter how many of its c<strong>on</strong>tained<br />

documents include the term and no matter how frequently the term appears<br />

in any of the documents. In other words, the index tracks which documents<br />

use the term, and how often, but the term appears in the index just <strong>on</strong>ce.”<br />

Addressing granularity at the document level: “To Search Kit, a<br />

document is atomic in that it defines the granularity of a search. Using Search


<strong>Smith</strong> 14<br />

Kit, your applicati<strong>on</strong> can find documents—as your applicati<strong>on</strong> understands<br />

them—but cannot locate the positi<strong>on</strong> of a term within a document.”<br />

Although two other indexing methods are available to Search Kit developers (a<br />

“vector index” and an “inverted vector index”), <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> is most likely using the<br />

inverted index opti<strong>on</strong>, for the following reas<strong>on</strong>s:<br />

Apple (2004) characterizes the inverted index structure as being “faster and<br />

smaller” than the two other Search Kit indexing methods, features essential<br />

to <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>.<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> can identify matching files, but cannot identify the locati<strong>on</strong> within a<br />

file where specified text appears.<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> provides keyword-based searching (as opposed to similarity<br />

searching). Baeza-Yates, & Ribeiro-Neto cites inverted file mechanisms as<br />

“currently the best choice for most [keyword-based search] applicati<strong>on</strong>s”<br />

(1999, p.191), and indeed, Apple (2004) recommends the Search Kit inverted<br />

index opti<strong>on</strong> as the best opti<strong>on</strong> for keyword-based systems.<br />

3. Retrieval Performance<br />

Precisi<strong>on</strong> and recall are the two classic measurements of a system’s informati<strong>on</strong> retrieval<br />

performance. Recall measures a system’s ability to retrieve all (known) relevant documents,<br />

while precisi<strong>on</strong> measures the percentage of relevant documents in a particular results set. A<br />

third measure, the harm<strong>on</strong>ic mean, combines precisi<strong>on</strong> and recall measurements into a single<br />

performance metric.<br />

Although an inevitably subjective process, neither precisi<strong>on</strong> nor recall can be tested without<br />

first identifying the subset of relevant documents for a particular informati<strong>on</strong> need. For this<br />

reas<strong>on</strong>, a retrieval task was designed in advance, and all pertinent documents within the 144-<br />

document collecti<strong>on</strong> were identified. User 1 assessed document relevancy, as she was<br />

familiar with the data set, and the task was then presented to User 2, who possessed no prior<br />

c<strong>on</strong>tact with the data set. User 2 c<strong>on</strong>ducted 3 search queries, with the following results:<br />

Relevant<br />

Documents<br />

Task 1: Locate all documents menti<strong>on</strong>ing the state of Virginia<br />

Of the 144 text documents in the full data set, 32 are deemed pertinent to the<br />

informati<strong>on</strong> need:<br />

005; 006; 007; 008; 009; 010; 011; 012; 013; 014; 015; 016; 017; 018; 020a;<br />

020b; 020c; 022; 028a; 029; 030; 031; 032; 050; 051; 052; 063a; 065; 072d; 078;<br />

080k; 081;


<strong>Smith</strong> 15<br />

Query #1<br />

Search String Virginia<br />

Results Set<br />

This query returned 17 documents, 13 of which were<br />

relevant.<br />

Precisi<strong>on</strong> 1 13/17, or 76.47% of all retrieved documents are relevant to<br />

the query. 1<br />

Recall<br />

17/32, or 53.13% of all relevant documents were retrieved<br />

by the query.<br />

2_______<br />

1 + 1<br />

.5313 .7647<br />

Harm<strong>on</strong>ic<br />

Mean<br />

= 2_______<br />

3.19<br />

Observati<strong>on</strong>s<br />

= 0.627<br />

All 4 n<strong>on</strong>-relevant documents were captured because the<br />

author’s first name was ‘Virginia’, a polysemic issue.<br />

Precisi<strong>on</strong> is high, but almost half of all relevant documents<br />

were not returned – these all referred to the state of<br />

Virginia as ‘VA” (or ‘Va’, in <strong>on</strong>e case).<br />

Query #2<br />

Search String VA<br />

Results Set<br />

This search returned 37 documents, 25 of which were<br />

relevant.<br />

Precisi<strong>on</strong> 1 25/37, or 67.56% of all retrieved documents are relevant to<br />

the query.<br />

Recall<br />

25/32, or 78.13%% of all relevant documents were<br />

retrieved by the query.<br />

2_______<br />

1 + 1<br />

.7813 .6756<br />

Harm<strong>on</strong>ic<br />

Mean<br />

= 2_______<br />

2.76<br />

Observati<strong>on</strong>s<br />

= 0.725<br />

Recall was quite high, primarily because the majority of<br />

relevant documents referred to the state as ‘VA’ in the<br />

subject line. A single author posted the majority of the<br />

messages. A larger data set, reflecting the posting styles of<br />

many different people, may not have yielded results as<br />

favorable.


<strong>Smith</strong> 16<br />

Query #3<br />

Search String Virginia or VA<br />

Results Set<br />

This search returned 5 documents, 4 of which were<br />

relevant.<br />

Precisi<strong>on</strong> 1 4/5, or 80% of all retrieved documents are relevant to the<br />

query.<br />

Recall<br />

4/32, or 12.5% of all relevant documents were retrieved by<br />

the query.<br />

2_______<br />

1 + 1<br />

.125 .8<br />

Harm<strong>on</strong>ic<br />

Mean<br />

= 2_______<br />

9.25<br />

Observati<strong>on</strong>s<br />

= 0.216<br />

Recall was extremely low, because the search string was not<br />

interpreted by the system as the user anticipated. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />

does not recognize “or” as a valid Boolean operator. Those<br />

documents that were retrieved happened to refer to the<br />

state of Virginia as both ‘Virginia’ and ‘VA’, and included at<br />

least <strong>on</strong>e word c<strong>on</strong>taining ‘or’ as a letter sequence (i.e.,<br />

“memorial”; “born”; “Dora”). Precisi<strong>on</strong> was high, but as<br />

the search string was not interpreted as expected by the<br />

user, it cannot be attributed to a well-formulated query.<br />

1<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> does not employ a ranking algorithm, and returns documents in lexicographic order by document<br />

name. For this reas<strong>on</strong>, precisi<strong>on</strong> cannot be presented at intermediate levels of recall<br />

Additi<strong>on</strong>al observati<strong>on</strong>s:<br />

<br />

<br />

<br />

<br />

As can be seen in the histograms <strong>on</strong> the following page, precisi<strong>on</strong> and recall display<br />

an inverse relati<strong>on</strong>ship for all three queries.<br />

Query 3 has the highest precisi<strong>on</strong>, but also the lowest recall, whereas query 2 has the<br />

lowest precisi<strong>on</strong>, but the highest recall of the three queries.<br />

Harm<strong>on</strong>ic mean is poorest for query 3, reflecting the large difference between<br />

precisi<strong>on</strong> and recall measures.<br />

The data set is fairly small (144 documents), and this analysis may not properly<br />

reflect the poor recall performance associated with searching large document<br />

collecti<strong>on</strong>s (Blair & Mar<strong>on</strong>, 1985).


<strong>Smith</strong> 17


<strong>Smith</strong> 18<br />

SYSTEM REVIEW<br />

Apple’s <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> applicati<strong>on</strong> is tightly integrated with the Mac operating system. As such, its<br />

indexing and retrieval operati<strong>on</strong>s are closely guarded proprietary processes, discussed <strong>on</strong>ly in<br />

the broadest terms in corporate literature. Despite these restricti<strong>on</strong>s, the above performance<br />

evaluati<strong>on</strong> provides a good understanding of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s underlying informati<strong>on</strong> retrieval<br />

model. What follows is a broad review of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s informati<strong>on</strong> retrieval features, as<br />

deduced from the evaluati<strong>on</strong> process. Search and retrieval issues that became apparent<br />

during the course of the evaluati<strong>on</strong> are also noted.<br />

IR Model<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s underlying informati<strong>on</strong> retrieval model appears to be a classic Boolean model<br />

with binary weighting; that is, a document is either relevant (included in results set) or n<strong>on</strong>relevant<br />

(excluded from results set). This is evidenced by the fact that results are presented in<br />

lexicographic order by file name, with no ranking algorithm used.<br />

Text operati<strong>on</strong>s<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s logical view of documents is full-text, and includes informati<strong>on</strong> relating to<br />

syntactic structure. Experimentati<strong>on</strong> does not suggest the existence of any text normalizati<strong>on</strong><br />

procedures. Specifically, there appears to be:<br />

<br />

<br />

<br />

No lexical analysis. For example, a <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> search <strong>on</strong> “full-text” will locate this<br />

paper; but it is excluded from the results list if the hyphen is excluded in the search<br />

query.<br />

No stopword removal. A search of the word ‘the’, for example yields 12,040 matches<br />

<strong>on</strong> the tested hard drive. This figure includes 1,1155 rich and plain text documents,<br />

and 443 PDF documents.<br />

No stemming. A search <strong>on</strong> the word ‘enter’ will return this paper in the results set,<br />

but fails to do so if the same word is searched with an added suffix of ‘-s’ or ‘-ing’.<br />

Text languages<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexes a broad range of text languages, including both plain and rich text formats,<br />

PDF documents, markup languages, and metadata for many comm<strong>on</strong> file formats.<br />

Additi<strong>on</strong>ally, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> can index multimedia files, system f<strong>on</strong>ts and scripts, and applicati<strong>on</strong>specific<br />

text such as e-mail messages, address book entries, etc. Additi<strong>on</strong>ally, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> plugins<br />

permit developers to expand <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s indexing coverage to handle less comm<strong>on</strong> or<br />

newly developed text languages and file formats.


<strong>Smith</strong> 19<br />

Query language and operati<strong>on</strong>s<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s query functi<strong>on</strong>s are resp<strong>on</strong>sible for many of the system’s limitati<strong>on</strong>s. It permits<br />

basic single-word or multiple-word queries <strong>on</strong> both text c<strong>on</strong>tent and syntax (metadata), and<br />

its internal divisi<strong>on</strong> of words into letters allows matching <strong>on</strong> partial words.<br />

Without an understanding of Unix command-line operati<strong>on</strong>s, however, users are unable to<br />

execute even the simplest of Boolean queries, and cannot specify c<strong>on</strong>text queries such as<br />

phrase or proximity searches. Apple either c<strong>on</strong>siders such querying to be bey<strong>on</strong>d the<br />

understanding of most users, or plans to release expanded search functi<strong>on</strong>ality in subsequent<br />

operating system releases. Regardless of the reas<strong>on</strong>, sophisticated querying is currently <strong>on</strong>ly<br />

available to ‘power users’.<br />

These query limitati<strong>on</strong>s are particularly troublesome because:<br />

1. They are not transparent to the user. Internet surfers who use basic Boolean<br />

operati<strong>on</strong>s such as AND, OR, and “phrase queries” may attempt to execute Boolean<br />

operati<strong>on</strong>s, and fail to correctly evaluate the results set, as dem<strong>on</strong>strated by User 2<br />

during the retrieval evaluati<strong>on</strong> (see Query 3). The <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> interface provides the<br />

user with no visual affordances as to proper (or improper) query formulati<strong>on</strong>.<br />

2. Language is a rich communicati<strong>on</strong> medium, offering a seemingly boundless diversity<br />

of ways in which c<strong>on</strong>cepts can be expressed. As a full-text indexing system with no<br />

apparent text normalizati<strong>on</strong>, <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s informati<strong>on</strong> retrieval performance is<br />

particularly susceptible to the problems of syn<strong>on</strong>ymy and polysemy, and thus<br />

particularly in need of sophisticated query operati<strong>on</strong>s. The “Virginia” retrieval task<br />

(Query 1) used in this paper’s retrieval performance evaluati<strong>on</strong> dem<strong>on</strong>strates this<br />

aptly.<br />

User interface, retrieval issues<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> offers a ‘Smart Folder’ feature, permitting structural queries via a series of<br />

dropdown menu choices that can be saved for future use. This interface was found to be<br />

stilted, limiting in opti<strong>on</strong>s, and failing in its support of essential query operati<strong>on</strong>s such as<br />

phrase searching, Boolean OR statements, etc.<br />

Smart Folders are envisi<strong>on</strong>ed as a means for users to access their files without regard to their<br />

physical storage locati<strong>on</strong>. Once a Smart Folder is defined and saved, it updates itself<br />

automatically, providing the user with an up-to-date list of all files meeting their specified<br />

criteria. Instead of browsing multiple times through layers of nested folders, users can<br />

potentially access all relevant informati<strong>on</strong> via a single, ‘virtual folder’. The Wall Street Journal<br />

rightly points out the potential for this model to change users’ primary mode of informati<strong>on</strong><br />

retrieval:<br />

“This is a big deal…<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> could spark a major change in the way people use<br />

computers. Instead of hunting for documents or clicking <strong>on</strong> programs, people may<br />

now start activities by searching for relevant files and then opening them as needed”<br />

(Mossberg, 2005).


<strong>Smith</strong> 20<br />

In its current form, however, the Smart Folder feature has <strong>on</strong>e significant drawback, as<br />

revealed during the system performance evaluati<strong>on</strong>. Users will still sometimes want to focus<br />

their query <strong>on</strong> a single folder. The Smart Folder feature permits this; unfortunately, the<br />

search will be executed <strong>on</strong>ly <strong>on</strong>e layer deep within the specified folder; the c<strong>on</strong>tents of any<br />

nested folders are not c<strong>on</strong>sidered by the query.<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s Smart Folder feature.<br />

Again, <strong>on</strong>ly experience and experimentati<strong>on</strong> will reveal this limitati<strong>on</strong> to the user; the<br />

interface provides no guidance, no manual is provided, and Apple’s Help applicati<strong>on</strong> is silent<br />

<strong>on</strong> the issue.<br />

CONCLUSION<br />

Multiple published reviews of the new Mac OS X 10.4 operating system (‘Tiger’) were<br />

c<strong>on</strong>sulted for this system evaluati<strong>on</strong>. These reviews are uniformly positive in their<br />

assessment of <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>, and so the project was approached with c<strong>on</strong>siderable optimism.<br />

Having encountered many serious issues with <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>’s retrieval model, I am now<br />

somewhat apt to believe these reviews were primarily derived from promoti<strong>on</strong>al materials<br />

supplied by Apple, and involved scant independent analysis.<br />

Put succinctly, the <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> indexing architecture is impressive; the indexing process is<br />

transparent, automatic, and tightly integrated with the operating system. Without the<br />

compani<strong>on</strong>ship of an effective informati<strong>on</strong> retrieval model and search interface, however,<br />

the full power and utility of the index remains inaccessible to the average user.


<strong>Smith</strong> 21<br />

BIBLIOGRAPHY<br />

Apple Computer, Inc. (2005a). <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>. Find anything, anywhere, fast. Retrieved August 2,<br />

2005 from http://www.apple.com/macosx/features/spotlight/.<br />

Apple Computer, Inc. (2005b). Tiger developer overview series: Working with <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>.<br />

Retrieved August 2, 2005 from http://developer.apple.com/macosx/spotlight.html.<br />

Apple Computer, Inc. (2005c). Technology brief: Mac OS X <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>. Find anything <strong>on</strong><br />

your Mac instantly. Retrieved August 2, 2005 from<br />

http://images.apple.com/macosx/pdf/MacOSX_<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>_TB.pdf.<br />

Apple Computer, Inc. (2004). Developer C<strong>on</strong>necti<strong>on</strong>. How Search Kit Works. Retrieved<br />

August 8, 2005 from<br />

http://developer.apple.com/documentati<strong>on</strong>/UserExperience/C<strong>on</strong>ceptual/SearchKi<br />

tC<strong>on</strong>cepts/searchKit_c<strong>on</strong>cepts/chapter_3_secti<strong>on</strong>_5.html<br />

Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Modern informati<strong>on</strong> retrieval. New York:<br />

ACM Press.<br />

Beagrie, N. (June, 2005). Plenty of room at the bottom Pers<strong>on</strong>al digital libraries and<br />

collecti<strong>on</strong>s. D-Lib Magazine, 11(6). Viewed August 5, 2006 at<br />

http://www.dlib.org/dlib/june05/beagrie/06beagrie.html.<br />

Blair, D.C., and Mar<strong>on</strong>, M.E. (March, 1985). An evaluati<strong>on</strong> of retrieval effectiveness for a<br />

full-text document-retrieval system. Communicati<strong>on</strong>s of the ACM, (28)3: 289-299.<br />

Coffee, P. (May 30, 2005). ‘Tiger’ invites developers in. eWeek, 22(22): 46.<br />

Lewis, P. (May 16, 2005). Tiger tale: Look before you leap. Fortune, 151(10): 200, 202.<br />

McElhearn, K. (August, 2005). Command <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g>. Macworld, 22(8): 88-89.<br />

Michaels, M. (September, 2004). 10 things to know about Tiger. Macworld, 21(9): 50-55.<br />

Mossberg, W.S. (April 28, 2005). Tiger leaps out in fr<strong>on</strong>t; Apple operating system offers new<br />

approach to searching, Smart Folders, better browser. Wall Street Journal (Eastern<br />

Editi<strong>on</strong>), p. B1. Retrieved August 2, 2005 from ProQuest database.<br />

Pogue, D. (April 28, 2005). Apple’s Tiger may even have PC owners l<strong>on</strong>ging for a Mac to<br />

put it in. The New York Times, pp. C1, C10. Retrieved August 2, 2005 from Lexis<br />

Nexis Academic database.<br />

Wildstrom, S.H. (May 9, 2005). Tiger makes Mac’s edge even sharper. Business Week, 3932:<br />

28.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!