11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sec. 9.4 Hashing 317Note th<strong>at</strong> in examples 2 <strong>and</strong> 3, either high- or low-order bits of the key are poorlydistributed.When designing hash functions, we are generally faced with one of two situ<strong>at</strong>ions.1. We know nothing about the distribution of the incoming keys. In this case,we wish to select a hash function th<strong>at</strong> evenly distributes the key range acrossthe hash table, while avoiding obvious opportunities for clustering such ashash functions th<strong>at</strong> are sensitive to the high- or low-order bits of the keyvalue.2. We know something about the distribution of the incoming keys. In this case,we should use a distribution-dependent hash function th<strong>at</strong> avoids assigningclusters of rel<strong>at</strong>ed key values to the same hash table slot. For example, ifhashing English words, we should not hash on the value of the first characterbecause this is likely to be unevenly distributed.Below are several examples of hash functions th<strong>at</strong> illustr<strong>at</strong>e these points.Example 9.5 Consider the following hash function used to hash integersto a table of sixteen slots:int h(int x) {return(x % 16);}The value returned by this hash function depends solely on the leastsignificant four bits of the key. Because these bits are likely to be poorlydistributed (as an example, a high percentage of the keys might be evennumbers, which means th<strong>at</strong> the low order bit is zero), the result will alsobe poorly distributed. This example shows th<strong>at</strong> the size of the table M canhave a big effect on the performance of a hash system because this value istypically used as the modulus to ensure th<strong>at</strong> the hash function produces anumber in the range 0 to M − 1.Example 9.6 A good hash function for numerical values comes from themid-square method. The mid-square method squares the key value, <strong>and</strong>then takes the middle r bits of the result, giving a value in the range 0 to2 r − 1. This works well because most or all bits of the key value contributeto the result. For example, consider records whose keys are 4-digit numbersin base 10. The goal is to hash these key values to a table of size 100(i.e., a range of 0 to 99). This range is equivalent to two digits in base 10.Th<strong>at</strong> is, r = 2. If the input is the number 4567, squaring yields an 8-digitnumber, 20857489. The middle two digits of this result are 57. All digits

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!