11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Sec. 9.2 Self-Organizing Lists 309Example 9.2 Calcul<strong>at</strong>e the expected cost for searching a list ordered byfrequency when the probabilities are defined as{ 1/2iif 0 ≤ i ≤ n − 2p i =1/2 n if i = n − 1.Then,n−1∑C n ≈ (i + 1)/2 i+1 =i=0n∑(i/2 i ) ≈ 2.For this example, the expected number of accesses is a constant. This isbecause the probability for accessing the first record is high (one half), thesecond is much lower (one quarter) but still much higher than for the thirdrecord, <strong>and</strong> so on. This shows th<strong>at</strong> for some probability distributions, orderingthe list by frequency can yield an efficient search technique.In many search applic<strong>at</strong>ions, real access p<strong>at</strong>terns follow a rule of thumb calledthe 80/20 rule. The 80/20 rule says th<strong>at</strong> 80% of the record accesses are to 20%of the records. The values of 80 <strong>and</strong> 20 are only estim<strong>at</strong>es; every d<strong>at</strong>a access p<strong>at</strong>ternhas its own values. However, behavior of this n<strong>at</strong>ure occurs surprisingly oftenin practice (which explains the success of caching techniques widely used by webbrowsers for speeding access to web pages, <strong>and</strong> by disk drive <strong>and</strong> CPU manufacturersfor speeding access to d<strong>at</strong>a stored in slower memory; see the discussion onbuffer pools in Section 8.3). When the 80/20 rule applies, we can expect considerableimprovements to search performance from a list ordered by frequency ofaccess over st<strong>and</strong>ard sequential search in an unordered list.Example 9.3 The 80/20 rule is an example of a Zipf distribution. N<strong>at</strong>urallyoccurring distributions often follow a Zipf distribution. Examplesinclude the observed frequency for the use of words in a n<strong>at</strong>ural languagesuch as English, <strong>and</strong> the size of the popul<strong>at</strong>ion for cities (i.e., view therel<strong>at</strong>ive proportions for the popul<strong>at</strong>ions as equivalent to the “frequency ofuse”). Zipf distributions are rel<strong>at</strong>ed to the Harmonic Series defined in Equ<strong>at</strong>ion2.10. Define the Zipf frequency for item i in the distribution for nrecords as 1/(iH n ) (see Exercise 9.4). The expected cost for the serieswhose members follow this Zipf distribution will ben∑C n = i/iH n = n/H n ≈ n/ log e n.i=1When a frequency distribution follows the 80/20 rule, the average searchlooks <strong>at</strong> about 10-15% of the records in a list ordered by frequency.i=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!