12.07.2015 Views

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

A Practical Introduction to Data Structures and Algorithm Analysis

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Sec. 9.4 Hashing 3331. Natural distributions are geometric. For example, consider the populationsof the 100 largest cities in the United States. If you plot these populationson a number line, most of them will be clustered <strong>to</strong>ward the low side, with afew outliers on the high side. This is an example of a Zipf distribution (seeSection 9.2). Viewed the other way, the home <strong>to</strong>wn for a given person is farmore likely <strong>to</strong> be a particular large city than a particular small <strong>to</strong>wn.2. Collected data are likely <strong>to</strong> be skewed in some way. Field samples might berounded <strong>to</strong>, say, the nearest 5 (i.e., all numbers end in 5 or 0).3. If the input is a collection of common English words, the beginning letterwill be poorly distributed.Note that in each of these examples, either high- or low-order bits of the key arepoorly distributed.When designing hash functions, we are generally faced with one of two situations.1. We know nothing about the distribution of the incoming keys. In this case,we wish <strong>to</strong> select a hash function that evenly distributes the key range acrossthe hash table, while avoiding obvious opportunities for clustering such ashash functions that are sensitive <strong>to</strong> the high- or low-order bits of the keyvalue.2. We know something about the distribution of the incoming keys. In this case,we should use a distribution-dependent hash function that avoids assigningclusters of related key values <strong>to</strong> the same hash table slot. For example, ifhashing English words, we should not hash on the value of the first characterbecause this is likely <strong>to</strong> be unevenly distributed.Below are several examples of hash functions that illustrate these points.Example 9.5 Consider the following hash function used <strong>to</strong> hash integers<strong>to</strong> a table of sixteen slots:int h(int x) {return(x % 16);}The value returned by this hash function depends solely on the leastsignificant four bits of the key. Because these bits are likely <strong>to</strong> be poorlydistributed (as an example, a high percentage of the keys might be evennumbers, which means that the low order bit is zero), the result will alsobe poorly distributed. This example shows that the size of the table M canhave a big effect on the performance of a hash system because this value istypically used as the modulus.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!