25.11.2014 Views

Algorithms and Data Structures

Algorithms and Data Structures

Algorithms and Data Structures

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

N.Wirth. <strong>Algorithms</strong> <strong>and</strong> <strong>Data</strong> <strong>Structures</strong>. Oberon version 200<br />

5 Key Transformations (Hashing)<br />

5.1 Introduction<br />

The principal question discussed in Chap. 4 at length is the following: Given a set of items characterized<br />

by a key (upon which an ordering relation is defined), how is the set to be organized so that retrieval of an<br />

item with a given key involves as little effort as possible? Clearly, in a computer store each item is ultimately<br />

accessed by specifying a storage address. Hence, the stated problem is essentially one of finding an<br />

appropriate mapping H of keys (K) into addresses (A):<br />

H: K → A<br />

In Chap. 4 this mapping was implemented in the form of various list <strong>and</strong> tree search algorithms based on<br />

different underlying data organizations. Here we present yet another approach that is basically simple <strong>and</strong><br />

very efficient in many cases. The fact that it also has some disadvantages is discussed subsequently.<br />

The data organization used in this technique is the array structure. H is therefore a mapping transforming<br />

keys into array indices, which is the reason for the term key transformation that is generally used for this<br />

technique. It should be noted that we shall not need to rely on any dynamic allocation procedures; the array<br />

is one of the fundamental, static structures. The method of key transformations is often used in problem<br />

areas where tree structures are comparable competitors.<br />

The fundamental difficulty in using a key transformation is that the set of possible key values is much<br />

larger than the set of available store addresses (array indices). Take for example names consisting of up to<br />

16 letters as keys identifying individuals in a set of a thous<strong>and</strong> persons. Hence, there are 26 16 possible<br />

keys which are to be mapped onto 10 3 possible indices. The function H is therefore obviously a many-toone<br />

function. Given a key k, the first step in a retrieval (search) operation is to compute its associated index<br />

h = H(k), <strong>and</strong> the second - evidently necessary - step is to verify whether or not the item with the key k is<br />

indeed identified by h in the array (table) T, i.e., to check whether T[H(k)].key = k. We are immediately<br />

confronted with two questions:<br />

1. What kind of function H should be used?<br />

2. How do we cope with the situation that H does not yield the location of the desired item?<br />

The answer to the second question is that some method must be used to yield an alternative location, say<br />

index h', <strong>and</strong>, if this is still not the location of the wanted item, yet a third index h", <strong>and</strong> so on. The case in<br />

which a key other than the desired one is at the identified location is called a collision; the task of<br />

generating alternative indices is termed collision h<strong>and</strong>ling. In the following we shall discuss the choice of a<br />

transformation function <strong>and</strong> methods of collision h<strong>and</strong>ling.<br />

5.2 Choice of a Hash Function<br />

A prerequisite of a good transformation function is that it distributes the keys as evenly as possible over<br />

the range of index values. Apart from satisfying this requirement, the distribution is not bound to any<br />

pattern, <strong>and</strong> it is actually desirable that it give the impression of being entirely r<strong>and</strong>om. This property has<br />

given this method the somewhat unscientific name hashing, i.e., chopping the argument up, or making a<br />

mess. H is called the hash function. Clearly, it should be efficiently computable, i.e., be composed of very<br />

few basic arithmetic operations.<br />

Assume that a transfer function ORD(k) is available <strong>and</strong> denotes the ordinal number of the key k in the<br />

set of all possible keys. Assume, furthermore, that the array indices i range over the intergers 0 .. N-1,<br />

where N is the size of the array. Then an obvious choice is

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!