23.11.2014 Views

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

of the word lists provided <strong>in</strong> two variants of Unix, we found that tak<strong>in</strong>g a to be<br />

33, 37, 39, or 41 produced less than 7 collisions <strong>in</strong> each case! It should come as<br />

no surprise, then, to learn that many <strong>Java</strong> implementations choose the polynomial<br />

hash function, us<strong>in</strong>g one of these constants for a, as a default hash code for<br />

str<strong>in</strong>gs. For the sake of speed, however, some <strong>Java</strong> implementations only apply<br />

the polynomial hash function to a fraction of the characters <strong>in</strong> long str<strong>in</strong>gs.<br />

Cyclic Shift Hash Codes<br />

A variant of the polynomial hash code replaces multiplication by a with a cyclic<br />

shift of a partial sum by a certa<strong>in</strong> number of bits. Such a function, applied to<br />

character str<strong>in</strong>gs <strong>in</strong> <strong>Java</strong> could, for example, look like the follow<strong>in</strong>g:<br />

static <strong>in</strong>t hashCode(Str<strong>in</strong>g s) {<br />

<strong>in</strong>t h=0;<br />

for (<strong>in</strong>t i=0; i> 27); // 5-bit cyclic shift of<br />

the runn<strong>in</strong>g sum<br />

}<br />

h + = (<strong>in</strong>t) s.charAt(i); // add <strong>in</strong> next character<br />

return h;<br />

}<br />

As with the traditional polynomial hash code, us<strong>in</strong>g the cyclic-shift hash code re<br />

quires some f<strong>in</strong>e-tun<strong>in</strong>g. In this case, we must wisely choose the amount to shift<br />

by for each new character. We show <strong>in</strong> Table 9.2 the results of some experiments<br />

run on a list of just over 25,000 English words, which compare the number of col<br />

lisions for various shift amounts. These <strong>and</strong> our previous experiments show that if<br />

we choose our constant a or our shift value wisely, then either the polynomial<br />

hash code or its cyclic-shift variant are suitable for any object that can be written<br />

as a tuple (x 0 ,x 1 ,… ,x k 1), where the order <strong>in</strong> tuples matters.<br />

Table 9.2: Comparison of collision behavior for the<br />

cyclic shift variant of the poly nomial hash code as<br />

applied to a list of just over 25,000 English words. The<br />

"Total" column records the total number of collisions<br />

<strong>and</strong> the "Max" column records the maximum number<br />

527

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!