Preliminary draft (c) 2007 Cambridge UP - Villanova University
Preliminary draft (c) 2007 Cambridge UP - Villanova University
Preliminary draft (c) 2007 Cambridge UP - Villanova University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1.4 Boolean querying, extend Boolean querying, and ranked retrieval 11<br />
MERGE(〈t i 〉)<br />
1 terms ← SORTBYFREQ(〈t i 〉)<br />
2 result ← postings[ f irst[terms]]<br />
3 terms ← rest[terms]<br />
4 while terms ̸= NIL and result ̸= NIL<br />
5 do list ← postings[ f irst[terms]]<br />
6 MERGEINPLACE(result, list)<br />
7 return result<br />
◮ Figure 1.8<br />
Algorithm for merging (or intersection) of a set of postings lists.<br />
algorithm is shown in Figure 1.8. The merge operation is then asymmetric:<br />
the intermediate result list is always shorter than the list it is merged with,<br />
and in many cases it might be orders of magnitude shorter. The postings<br />
merge can still be done by an adaptation of the algorithm in Figure 1.7, so<br />
that it calculates the result in place by updating the first list. When the difference<br />
between the list lengths is very large, opportunities to use alternative<br />
techniques open up. One alternative is to do the intersection by performing<br />
a sequence of binary searches in the long postings list for each term in the<br />
intermediate result list. Another possibility is to store the long postings list<br />
as a hashtable, so that membership of an intermediate result item can be calculated<br />
in constant rather than linear or log time. However, such alternative<br />
techniques are difficult to combine with postings list compression of the sort<br />
discussed in Chapter 5. Moreover, standard postings list merge operations<br />
remain necessary when both terms of a query are very common.<br />
1.4 Boolean querying, extend Boolean querying, and ranked retrieval<br />
RANKED RETRIEVAL<br />
MODELS<br />
The Boolean retrieval model contrasts with ranked retrieval models such as the<br />
vector space model (Chapter 7), in which users largely use free-text queries<br />
rather than a precise language for building up query expressions, and the<br />
system decides which documents best satisfy the query. Despite decades<br />
of academic research on the advantages of ranked retrieval, systems implementing<br />
the Boolean retrieval model were the main or only search option<br />
provided by large commercial information providers for three decades until<br />
approximately the arrival of the World Wide Web (in the early 1990s).<br />
However, these systems did not have just the basic Boolean operations<br />
(AND, OR, and NOT) which we have presented so far. A strict Boolean expression<br />
over terms with an unordered results set is too limited for many of<br />
<strong>Preliminary</strong> <strong>draft</strong> (c)<strong>2007</strong> <strong>Cambridge</strong> <strong>UP</strong>