09.11.2014 Views

Preliminary draft (c) 2007 Cambridge UP - Villanova University

Preliminary draft (c) 2007 Cambridge UP - Villanova University

Preliminary draft (c) 2007 Cambridge UP - Villanova University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.4 Boolean querying, extend Boolean querying, and ranked retrieval 11<br />

MERGE(〈t i 〉)<br />

1 terms ← SORTBYFREQ(〈t i 〉)<br />

2 result ← postings[ f irst[terms]]<br />

3 terms ← rest[terms]<br />

4 while terms ̸= NIL and result ̸= NIL<br />

5 do list ← postings[ f irst[terms]]<br />

6 MERGEINPLACE(result, list)<br />

7 return result<br />

◮ Figure 1.8<br />

Algorithm for merging (or intersection) of a set of postings lists.<br />

algorithm is shown in Figure 1.8. The merge operation is then asymmetric:<br />

the intermediate result list is always shorter than the list it is merged with,<br />

and in many cases it might be orders of magnitude shorter. The postings<br />

merge can still be done by an adaptation of the algorithm in Figure 1.7, so<br />

that it calculates the result in place by updating the first list. When the difference<br />

between the list lengths is very large, opportunities to use alternative<br />

techniques open up. One alternative is to do the intersection by performing<br />

a sequence of binary searches in the long postings list for each term in the<br />

intermediate result list. Another possibility is to store the long postings list<br />

as a hashtable, so that membership of an intermediate result item can be calculated<br />

in constant rather than linear or log time. However, such alternative<br />

techniques are difficult to combine with postings list compression of the sort<br />

discussed in Chapter 5. Moreover, standard postings list merge operations<br />

remain necessary when both terms of a query are very common.<br />

1.4 Boolean querying, extend Boolean querying, and ranked retrieval<br />

RANKED RETRIEVAL<br />

MODELS<br />

The Boolean retrieval model contrasts with ranked retrieval models such as the<br />

vector space model (Chapter 7), in which users largely use free-text queries<br />

rather than a precise language for building up query expressions, and the<br />

system decides which documents best satisfy the query. Despite decades<br />

of academic research on the advantages of ranked retrieval, systems implementing<br />

the Boolean retrieval model were the main or only search option<br />

provided by large commercial information providers for three decades until<br />

approximately the arrival of the World Wide Web (in the early 1990s).<br />

However, these systems did not have just the basic Boolean operations<br />

(AND, OR, and NOT) which we have presented so far. A strict Boolean expression<br />

over terms with an unordered results set is too limited for many of<br />

<strong>Preliminary</strong> <strong>draft</strong> (c)<strong>2007</strong> <strong>Cambridge</strong> <strong>UP</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!