12.07.2015 Views

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sec. 9.2 Self-Organizing Lists 325A geometric probability distribution can yield quite different results.Example 9.2 Calculate the expected cost for searching a list ordered byfrequency when the probabilities are defined as{ 1/2iif 1 ≤ i ≤ n − 1p i =1/2 n−1 if i = n.Then,C n ≈n∑(i/2 i ) ≈ 2.i=1For this example, the expected number of accesses is a constant. This isbecause the probability for accessing the first record is high, the second ismuch lower but still much higher than for record three, <strong>and</strong> so on. Thisshows that for some probability distributions, ordering the list by frequencycan yield an efficient search technique.In many search applications, real access patterns follow a rule of thumb calledthe 80/20 rule. The 80/20 rule says that 80% of the record accesses are <strong>to</strong> 20% ofthe records. The values of 80 <strong>and</strong> 20 are only estimates; every application has itsown values. However, behavior of this nature occurs surprisingly often in practice(which explains the success of caching techniques widely used by disk drive <strong>and</strong>CPU manufacturers for speeding access <strong>to</strong> data s<strong>to</strong>red in slower memory; see thediscussion on buffer pools in Section 8.3). When the 80/20 rule applies, we canexpect reasonable search performance from a list ordered by frequency of access.Example 9.3 The 80/20 rule is an example of a Zipf distribution. Naturallyoccurring distributions often follow a Zipf distribution. Examplesinclude the observed frequency for the use of words in a natural languagesuch as English, <strong>and</strong> the size of the population for cities (i.e., view therelative proportions for the populations as equivalent <strong>to</strong> the “frequency ofuse”). Zipf distributions are related <strong>to</strong> the Harmonic Series defined in Equation2.10. Define the Zipf frequency for item i in the distribution for nrecords as 1/(iH n ) (see Exercise 9.4). The expected cost for the serieswhose members follow this Zipf distribution will ben∑C n = i/iH n = n/H n ≈ n/ log e n.i=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!