23.11.2014 Views

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

key k will skip over cells conta<strong>in</strong><strong>in</strong>g the available marker <strong>and</strong> cont<strong>in</strong>ue prob<strong>in</strong>g<br />

until reach <strong>in</strong>g the desired entry or an empty bucket (or return<strong>in</strong>g back to where<br />

we started from). Additionally, our algorithm for put(k,v) should remember an<br />

available cell encountered dur<strong>in</strong>g the search for k, s<strong>in</strong>ce this is a valid place to put<br />

a new entry (k,v). Thus, l<strong>in</strong>ear prob<strong>in</strong>g saves space, but it complicates removals.<br />

Even with the use of the available marker object, l<strong>in</strong>ear prob<strong>in</strong>g suffers from an<br />

additional disadvantage. It tends to cluster the entries of the map <strong>in</strong>to contiguous<br />

runs, which may even overlap (particularly if more than half of the cells <strong>in</strong> the<br />

hash table are occupied). Such contiguous runs of occupied hash cells causes<br />

searches to slow down considerably.<br />

Quadratic Prob<strong>in</strong>g<br />

Another open address<strong>in</strong>g strategy, known as quadratic prob<strong>in</strong>g, <strong>in</strong>volves<br />

iteratively try<strong>in</strong>g the buckets A[(i + f (j)) mod N], for j = 0,1,2,…, where f (j) =j 2 ,<br />

until f<strong>in</strong>d<strong>in</strong>g an empty bucket. As with l<strong>in</strong>ear prob<strong>in</strong>g, the quadratic prob<strong>in</strong>g<br />

strategy complicates the removal operation, but it does avoid the k<strong>in</strong>ds of<br />

cluster<strong>in</strong>g patterns that occur with l<strong>in</strong>ear prob<strong>in</strong>g. Nevertheless, it creates its own<br />

k<strong>in</strong>d of cluster<strong>in</strong>g, called secondary cluster<strong>in</strong>g, where the set of filled array cells<br />

"bounces" around the array <strong>in</strong> a fixed pattern. If N is not chosen as a prime, then<br />

the quadratic prob<strong>in</strong>g strategy may not f<strong>in</strong>d an empty bucket <strong>in</strong> A even if one<br />

exists. In fact, even if N is prime, this strategy may not f<strong>in</strong>d an empty slot, if the<br />

bucket array is at least half full; we explore the cause of this type of cluster<strong>in</strong>g <strong>in</strong><br />

an exercise (C-9.9).<br />

Double Hash<strong>in</strong>g<br />

Another open address<strong>in</strong>g strategy that does not cause cluster<strong>in</strong>g of the k<strong>in</strong>d pro<br />

duced by l<strong>in</strong>ear prob<strong>in</strong>g or the k<strong>in</strong>d produced by quadratic prob<strong>in</strong>g is the double<br />

hash<strong>in</strong>g strategy. In this approach, we choose a secondary hash function, h ′, <strong>and</strong><br />

if h maps some key k to a bucket A[i], with i = h(k), that is already occupied, then<br />

we iteratively try the buckets A[(i + f (j)) mod N] next, for j = 1,2,3,…, where f (j)<br />

= j. h ′(k). In this scheme, the secondary hash function is not allowed to eval uate<br />

to zero; a common choice is h ′(k) = q - (k mod q), for some prime number q < N.<br />

Also, N should be a prime. Moreover, we should choose a secondary hash<br />

function that will attempt to m<strong>in</strong>imize cluster<strong>in</strong>g as much as possible.<br />

These open address<strong>in</strong>g schemes save some space over the separate cha<strong>in</strong><strong>in</strong>g<br />

method, but they are not necessarily faster. In experimental <strong>and</strong> theoretical anal<br />

yses, the cha<strong>in</strong><strong>in</strong>g method is either competitive or faster than the other methods,<br />

depend<strong>in</strong>g on the load factor of the bucket array. So, if memory space is not a<br />

major issue, the collision-h<strong>and</strong>l<strong>in</strong>g method of choice seems to be separate cha<strong>in</strong><br />

<strong>in</strong>g. Still, if memory space is <strong>in</strong> short supply, then one of these open address<strong>in</strong>g<br />

methods might be worth implement<strong>in</strong>g, provided our prob<strong>in</strong>g strategy m<strong>in</strong>imizes<br />

the cluster<strong>in</strong>g that can occur from open address<strong>in</strong>g.<br />

535

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!