23.11.2014 Views

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Hash Codes <strong>in</strong> <strong>Java</strong><br />

The generic Object class def<strong>in</strong>ed <strong>in</strong> a <strong>Java</strong> comes with a default hashCode()<br />

method for mapp<strong>in</strong>g each object <strong>in</strong>stance to an <strong>in</strong>teger that is a "representation" of<br />

that ob ject. Specifically, the hashCode() method returns a 32-bit <strong>in</strong>teger of<br />

type <strong>in</strong>t. Un less specifically overridden, this method is <strong>in</strong>herited by every<br />

object used <strong>in</strong> a <strong>Java</strong> program. We should be careful <strong>in</strong> us<strong>in</strong>g the default Object<br />

version of hashCode(), however, as this could just be an <strong>in</strong>teger <strong>in</strong>terpretation<br />

of the object's location <strong>in</strong> memory (as is the case <strong>in</strong> many <strong>Java</strong> implementations).<br />

This type of hash code works poorly with character str<strong>in</strong>gs, for example, because<br />

two different str<strong>in</strong>g ob jects <strong>in</strong> memory might actually be equal, <strong>in</strong> which case we<br />

would like them to have the same hash code. Indeed, the <strong>Java</strong> Str<strong>in</strong>g class<br />

overrides the hashCode method of the Object class to be someth<strong>in</strong>g more<br />

appropriate for character str<strong>in</strong>gs. Like wise, if we <strong>in</strong>tend to use certa<strong>in</strong> objects as<br />

keys <strong>in</strong> a map, then we should override the built-<strong>in</strong> hashCode() method for<br />

these objects, replac<strong>in</strong>g it with a mapp<strong>in</strong>g that assigns well-spread, consistent<br />

<strong>in</strong>tegers to these types of objects.<br />

Let us consider, then, several common data types <strong>and</strong> some example methods for<br />

assign<strong>in</strong>g hash codes to objects of these types.<br />

Cast<strong>in</strong>g to an Integer<br />

To beg<strong>in</strong>, we note that, for any data type X that is represented us<strong>in</strong>g at most as<br />

many bits as our <strong>in</strong>teger hash codes, we can simply take as a hash code for X an<br />

<strong>in</strong>teger <strong>in</strong>terpretation of its bits. Thus, for <strong>Java</strong> base types byte, short, <strong>in</strong>t, <strong>and</strong><br />

char, we can achieve a good hash code simply by cast<strong>in</strong>g this type to <strong>in</strong>t.<br />

Likewise, for a variable x of base type float, we can convert x to an <strong>in</strong>teger us<strong>in</strong>g<br />

a call to Float.floatToIntBits(x), <strong>and</strong> then use this <strong>in</strong>teger as x's hash<br />

code.<br />

Summ<strong>in</strong>g Components<br />

For base types, such as long <strong>and</strong> double, whose bit representation is double that<br />

of a hash code, the above scheme is not immediately applicable. Still, one<br />

possible hash code, <strong>and</strong> <strong>in</strong>deed one that is used by many <strong>Java</strong> implementations, is<br />

to simply cast a (long) <strong>in</strong>teger representation of the type down to an <strong>in</strong>teger the<br />

size of a hash code. This hash code, of course, ignores half of the <strong>in</strong>formation<br />

present <strong>in</strong> the orig<strong>in</strong>al value, <strong>and</strong> if many of the keys <strong>in</strong> our map only differ <strong>in</strong><br />

these bits, then they will collide us<strong>in</strong>g this simple hash code. An alternative hash<br />

code, then, which takes all the orig<strong>in</strong>al bits <strong>in</strong>to consideration, is to sum an <strong>in</strong>teger<br />

representation of the high-order bits with an <strong>in</strong>teger representation of the loworder<br />

bits. Such a hash code can be written <strong>in</strong> <strong>Java</strong> as follows:<br />

525

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!