15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

FIGURE 32.2 Extendible hashing with block size B = 3. The keys are indicated in italics. For convenience of<br />

exposition, the hash address of a key consists of its binary representation. For example, the hash address of key 4 is<br />

“…000100” and the hash address of key 44 is “…0101100”. (a) The hash table after insertion of the keys 4, 23, 18,<br />

10, 44, 32, 9. (b) Insertion of the key 76 into table location 100 causes the block with local depth 2 split into two<br />

blocks with local depth 3. (c) Insertion of the key 20 into table location 100 causes a block with local depth 3 to split<br />

into two blocks with local depth 4. The directory doubles in size and the global depth d is incremented from 3 to 4.<br />

range of N. The challenge is to develop dynamic EM structures that can adapt smoothly to widely varying<br />

values of N.<br />

EM hashing methods fall into one of two categories: directory methods and directoryless methods. Fagin<br />

et al. [79] proposed a directory scheme called extendible hashing, illustrated in Fig. 32.2: The directory,<br />

for a given d ≥ 0, consists of a table (array) of 2 d pointers. Each item is assigned to the table location<br />

corresponding to the d least significant bits of its hash address. The value of d, called the global depth,<br />

is set to the smallest value for which each table location has at most B items assigned to it. Each table<br />

location contains a pointer to a block where its items are stored. Thus, a lookup takes two I/Os: one to<br />

access the directory and one to access the block storing the item. If the directory fits in internal memory,<br />

only one I/O is needed. Several table locations may have many fewer than B assigned items, and for<br />

purposes of minimizing storage utilization, they can share the same disk block for storing their items by<br />

using a local depth smaller than the global depth. When new items are inserted and deleted, the blocks<br />

can overflow or underflow, and the local depths and global depth are changed accordingly.<br />

The expected number of disk blocks required to store the data items is asymptotically n/ln 2 ≈ n/0.69;<br />

that is, the blocks tend to be about 69% full [147]. At least Ω(n/B) blocks are needed to store the directory.<br />

Flajolet [86] showed on the average that the directory uses Θ(N 1/B n/B) = Θ(N 1+1/B /B 2 ) blocks, which<br />

can be superlinear in N asymptotically; however, for practical values of N and B, the N 1/B term is a small<br />

constant, typically less than 2, and directory size is within a constant factor of optimal.<br />

The resulting directory is equivalent to the leaves of a perfectly balanced trie [124], in which the search<br />

path for each item is determined by its hash address, except that hashing allows the leaves of the trie to<br />

© 2002 by CRC Press LLC<br />

2<br />

000 000<br />

001<br />

010<br />

011<br />

100<br />

101<br />

110<br />

111<br />

3<br />

1<br />

3<br />

local depth<br />

4 44 32<br />

001<br />

18 18<br />

23 9<br />

10<br />

010<br />

011<br />

100<br />

101<br />

110<br />

111<br />

global depth d = 3 global depth d = 3<br />

(a) (b)<br />

3<br />

3<br />

1<br />

3<br />

3<br />

32<br />

23 9<br />

4 44 76<br />

10<br />

0000<br />

0001<br />

0010<br />

0011<br />

0100<br />

0101<br />

0110<br />

0111<br />

1000<br />

1001<br />

1010<br />

1011<br />

1100<br />

1101<br />

1110<br />

1111<br />

3<br />

3<br />

1<br />

4<br />

3<br />

4<br />

global depth d<br />

= 4<br />

(c)<br />

32<br />

18<br />

23 9<br />

4 20<br />

10<br />

44 76

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!