25.11.2014 Views

Algorithms and Data Structures

Algorithms and Data Structures

Algorithms and Data Structures

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

N.Wirth. <strong>Algorithms</strong> <strong>and</strong> <strong>Data</strong> <strong>Structures</strong>. Oberon version 201<br />

H(k) = ORD(k) MOD N<br />

It has the property that the key values are spread evenly over the index range, <strong>and</strong> it is therefore the basis<br />

of most key transformations. It is also very efficiently computable, if N is a power of 2. But it is exactly this<br />

case that must be avoided, if the keys are sequences of letters. The assumption that all keys are equally<br />

likely is in this case mistaken. In fact, words that differ by only a few characters then most likely map onto<br />

identical indices, thus effectively causing a most uneven distribution. It is therefore particularly<br />

recommended to let N be a prime number [5-2]. This has the conseqeunce that a full division operation is<br />

needed that cannot be replaced by a mere masking of binary digits, but this is no serious drawback on<br />

most modern computers that feature a built-in division instruction.<br />

Often, hash funtions are used which consist of applying logical operations such as the exclusive or to<br />

some parts of the key represented as a sequence of binary digits. These operations may be faster than<br />

division on some computers, but they sometimes fail spectacularly to distribute the keys evenly over the<br />

range of indices. We therefore refrain from discussing such methods in further detail.<br />

5.3 Collision H<strong>and</strong>ling<br />

If an entry in the table corresponding to a given key turns out not to be the desired item, then a collision<br />

is present, i.e., two items have keys mapping onto the same index. A second probe is necessary, one<br />

based on an index obtained in a deterministic manner from the given key. There exist several methods of<br />

generating secondary indices. An obvious one is to link all entries with identical primary index H(k) together<br />

in a linked list. This is called direct chaining. The elements of this list may be in the primary table or not; in<br />

the latter case, storage in which they are allocated is usually called an overflow area. This method has the<br />

disadvantage that secondary lists must be maintained, <strong>and</strong> that each entry must provide space for a pointer<br />

(or index) to its list of collided items.<br />

An alternative solution for resolving collisions is to dispense with links entirely <strong>and</strong> instead simply look at<br />

other entries in the same table until the item is found or an open position is encountered, in which case one<br />

may assume that the specified key is not present in the table. This method is called open addressing [5-3].<br />

Naturally, the sequence of indices of secondary probes must always be the same for a given key. The<br />

algorithm for a table lookup can then be sketched as follows:<br />

h := H(k); i := 0;<br />

REPEAT<br />

IF T[h].key = k THEN item found<br />

ELSIF T[h].key = free THEN item is not in table<br />

ELSE (*collision*)<br />

i := i+1; h := H(k) + G(i)<br />

END<br />

UNTIL found or not in table (or table full)<br />

Various functions for resolving collisions have been proposed in the literature. A survey of the topic by<br />

Morris in 1968 [4-8] stimulated considerable activities in this field. The simplest method is to try for the<br />

next location - considering the table to be circular - until either the item with the specified key is found or an<br />

empty location is encountered. Hence, G(i) = i; the indices h i used for probing in this case are<br />

h 0 = H(k)<br />

h i = (h i-1 + i) MOD N, i = 1 ... N-1<br />

This method is called linear probing <strong>and</strong> has the disadvantage that entries have a tendency to cluster<br />

around the primary keys (keys that had not collided upon insertion). Ideally, of course, a function G should<br />

be chosen that again spreads the keys uniformly over the remaining set of locations. In practice, however,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!