11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sec. 9.4 Hashing 33354InsertDelete3210 .2 .4 .6 .8 1.0Figure 9.10 Growth of expected record accesses with α. The horizontal axis isthe value for α, the vertical axis is the expected number of accesses to the hashtable. Solid lines show the cost for “r<strong>and</strong>om” probing (a theoretical lower boundon the cost), while dashed lines show the cost for linear probing (a rel<strong>at</strong>ively poorcollision resolution str<strong>at</strong>egy). The two leftmost lines show the cost for insertion(equivalently, unsuccessful search); the two rightmost lines show the cost for deletion(equivalently, successful search).accessed, the chance of the next record access coming to the same disk block isno better than r<strong>and</strong>om chance in a well-designed hash system. This is because agood hashing implement<strong>at</strong>ion breaks up rel<strong>at</strong>ionships between search keys. Insteadof improving performance by taking advantage of locality of reference, hashingtrades increased hash table space for an improved chance th<strong>at</strong> the record will bein its home position. Thus, the more space available for the hash table, the moreefficient hashing should be.Depending on the p<strong>at</strong>tern of record accesses, it might be possible to reduce theexpected cost of access even in the face of collisions. Recall the 80/20 rule: 80%of the accesses will come to 20% of the d<strong>at</strong>a. In other words, some records areaccessed more frequently. If two records hash to the same home position, whichwould be better placed in the home position, <strong>and</strong> which in a slot further down theprobe sequence? The answer is th<strong>at</strong> the record with higher frequency of accessshould be placed in the home position, because this will reduce the total number ofrecord accesses. Ideally, records along a probe sequence will be ordered by theirfrequency of access.One approach to approxim<strong>at</strong>ing this goal is to modify the order of records alongthe probe sequence whenever a record is accessed. If a search is made to a record

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!