23.11.2014 Views

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

135<br />

3<br />

15<br />

1082<br />

6<br />

16<br />

8760<br />

9<br />

9.2.4 Compression Functions<br />

The hash code for a key k will typically not be suitable for immediate use with a<br />

bucket array, because the range of possible hash codes for our keys will typically<br />

exceed the range of legal <strong>in</strong>dices of our bucket array A. That is, <strong>in</strong>correctly us<strong>in</strong>g a<br />

hash code as an <strong>in</strong>dex <strong>in</strong>to our bucket array may result <strong>in</strong> an array out-of-bounds<br />

exception be<strong>in</strong>g thrown, either because the <strong>in</strong>dex is negative or it exceeds the ca<br />

pacity of A. Thus, once we have determ<strong>in</strong>ed an <strong>in</strong>teger hash code for a key object k,<br />

there is still the issue of mapp<strong>in</strong>g that <strong>in</strong>teger <strong>in</strong>to the range [0,N − 1]. This map<br />

p<strong>in</strong>g is the second action that a hash function performs, <strong>and</strong> a good compression<br />

function is one that m<strong>in</strong>imizes the possible number of collisions <strong>in</strong> a given set of<br />

hash codes.<br />

The Division Method<br />

One simple compression function is the division method, which maps an <strong>in</strong>teger<br />

i to<br />

|i| mod N,<br />

where N, the size of the bucket array, is a fixed positive <strong>in</strong>teger. Additionally, if<br />

we take N to be a prime number, then this compression function helps "spread<br />

out" the distribution of hashed values. Indeed, if N is not prime, then there is a<br />

higher likelihood that patterns <strong>in</strong> the distribution of hash codes will be repeated <strong>in</strong><br />

the distribution of hash values, thereby caus<strong>in</strong>g collisions. For example, if we<br />

<strong>in</strong>sert keys with hash codes {200,205,210,215,220,... ,600} <strong>in</strong>to a bucket array of<br />

size 100, then each hash code will collide with three others. But if we use a bucket<br />

array of size 101, then there will be no collisions. If a hash function is chosen<br />

well, it should ensure that the probability of two different keys gett<strong>in</strong>g hashed to<br />

the same bucket is 1/N. Choos<strong>in</strong>g N to be a prime number is not always enough,<br />

530

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!