11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Sec. 9.5 Further Reading 335insertions <strong>and</strong> deletions. After the table is loaded with the initial collection ofrecords, the first few deletions will lengthen the average probe sequence distancefor records (it will add tombstones). Over time, the average distance will reachan equilibrium point because insertions will tend to decrease the average distanceby filling in tombstone slots. For example, after initially loading records into thed<strong>at</strong>abase, the average p<strong>at</strong>h distance might be 1.2 (i.e., an average of 0.2 accessesper search beyond the home position will be required). After a series of insertions<strong>and</strong> deletions, this average distance might increase to 1.6 due to tombstones. Thisseems like a small increase, but it is three times longer on average beyond the homeposition than before deletions.Two possible solutions to this problem are1. Do a local reorganiz<strong>at</strong>ion upon deletion to try to shorten the average p<strong>at</strong>hlength. For example, after deleting a key, continue to follow the probe sequenceof th<strong>at</strong> key <strong>and</strong> swap records further down the probe sequence intothe slot of the recently deleted record (being careful not to remove any keyfrom its probe sequence). This will not work for all collision resolution policies.2. Periodically rehash the table by reinserting all records into a new hash table.Not only will this remove the tombstones, but it also provides an opportunityto place the most frequently accessed records into their home positions.9.5 Further ReadingFor a comparison of the efficiencies for various self-organizing techniques, seeBentley <strong>and</strong> McGeoch, “Amortized <strong>Analysis</strong> of Self-Organizing Sequential SearchHeuristics” [BM85]. The text compression example of Section 9.2 comes fromBentley et al., “A Locally Adaptive <strong>D<strong>at</strong>a</strong> Compression Scheme” [BSTW86]. Formore on Ziv-Lempel coding, see <strong>D<strong>at</strong>a</strong> Compression: Methods <strong>and</strong> Theory byJames A. Storer [Sto88]. Knuth covers self-organizing lists <strong>and</strong> Zipf distributionsin Volume 3 of The Art of <strong>Computer</strong> Programming[Knu98].Introduction to Modern Inform<strong>at</strong>ion Retrieval by Salton <strong>and</strong> McGill [SM83] isan excellent source for more inform<strong>at</strong>ion about document retrieval techniques.See the paper “Practical Minimal Perfect Hash Functions for Large <strong>D<strong>at</strong>a</strong>bases”by Fox et al. [FHCD92] for an introduction <strong>and</strong> a good algorithm for perfect hashing.For further details on the analysis for various collision resolution policies, seeKnuth, Volume 3 [Knu98] <strong>and</strong> Concrete M<strong>at</strong>hem<strong>at</strong>ics: A Found<strong>at</strong>ion for <strong>Computer</strong><strong>Science</strong> by Graham, Knuth, <strong>and</strong> P<strong>at</strong>ashnik [GKP94].The model of hashing presented in this chapter has been of a fixed-size hashtable. A problem not addressed is wh<strong>at</strong> to do when the hash table gets half full <strong>and</strong>more records must be inserted. This is the domain of dynamic hashing methods.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!