01.01.2015 Views

Spotlight on Spotlight - Carol Smith Home Page

Spotlight on Spotlight - Carol Smith Home Page

Spotlight on Spotlight - Carol Smith Home Page

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Smith</strong> 16<br />

Query #3<br />

Search String Virginia or VA<br />

Results Set<br />

This search returned 5 documents, 4 of which were<br />

relevant.<br />

Precisi<strong>on</strong> 1 4/5, or 80% of all retrieved documents are relevant to the<br />

query.<br />

Recall<br />

4/32, or 12.5% of all relevant documents were retrieved by<br />

the query.<br />

2_______<br />

1 + 1<br />

.125 .8<br />

Harm<strong>on</strong>ic<br />

Mean<br />

= 2_______<br />

9.25<br />

Observati<strong>on</strong>s<br />

= 0.216<br />

Recall was extremely low, because the search string was not<br />

interpreted by the system as the user anticipated. <str<strong>on</strong>g>Spotlight</str<strong>on</strong>g><br />

does not recognize “or” as a valid Boolean operator. Those<br />

documents that were retrieved happened to refer to the<br />

state of Virginia as both ‘Virginia’ and ‘VA’, and included at<br />

least <strong>on</strong>e word c<strong>on</strong>taining ‘or’ as a letter sequence (i.e.,<br />

“memorial”; “born”; “Dora”). Precisi<strong>on</strong> was high, but as<br />

the search string was not interpreted as expected by the<br />

user, it cannot be attributed to a well-formulated query.<br />

1<br />

<str<strong>on</strong>g>Spotlight</str<strong>on</strong>g> does not employ a ranking algorithm, and returns documents in lexicographic order by document<br />

name. For this reas<strong>on</strong>, precisi<strong>on</strong> cannot be presented at intermediate levels of recall<br />

Additi<strong>on</strong>al observati<strong>on</strong>s:<br />

<br />

<br />

<br />

<br />

As can be seen in the histograms <strong>on</strong> the following page, precisi<strong>on</strong> and recall display<br />

an inverse relati<strong>on</strong>ship for all three queries.<br />

Query 3 has the highest precisi<strong>on</strong>, but also the lowest recall, whereas query 2 has the<br />

lowest precisi<strong>on</strong>, but the highest recall of the three queries.<br />

Harm<strong>on</strong>ic mean is poorest for query 3, reflecting the large difference between<br />

precisi<strong>on</strong> and recall measures.<br />

The data set is fairly small (144 documents), and this analysis may not properly<br />

reflect the poor recall performance associated with searching large document<br />

collecti<strong>on</strong>s (Blair & Mar<strong>on</strong>, 1985).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!