11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

332 Chap. 9 SearchingThe cost for a successful search (or a deletion) has the same cost as originallyinserting th<strong>at</strong> record. However, the expected value for the insertion cost dependson the value of α not <strong>at</strong> the time of deletion, but r<strong>at</strong>her <strong>at</strong> the time of the originalinsertion. We can derive an estim<strong>at</strong>e of this cost (essentially an average over all theinsertion costs) by integr<strong>at</strong>ing from 0 to the current value of α, yielding a result of∫1 αα 011 − x dx = 1 α log e11 − α .It is important to realize th<strong>at</strong> these equ<strong>at</strong>ions represent the expected cost foroper<strong>at</strong>ions using the unrealistic assumption th<strong>at</strong> the probe sequence is based on ar<strong>and</strong>om permut<strong>at</strong>ion of the slots in the hash table (thus avoiding all expense resultingfrom clustering). Thus, these costs are lower-bound estim<strong>at</strong>es in the averagecase. The true average cost under linear probing is 1 2 (1 + 1/(1 − α)2 ) for insertionsor unsuccessful searches <strong>and</strong> 1 2(1+1/(1−α)) for deletions or successful searches.Proofs for these results can be found in the references cited in Section 9.5.Figure 9.10 shows the graphs of these four equ<strong>at</strong>ions to help you visualize theexpected performance of hashing based on the load factor. The two solid lines showthe costs in the case of a “r<strong>and</strong>om” probe sequence for (1) insertion or unsuccessfulsearch <strong>and</strong> (2) deletion or successful search. As expected, the cost for insertion orunsuccessful search grows faster, because these oper<strong>at</strong>ions typically search furtherdown the probe sequence. The two dashed lines show equivalent costs for linearprobing. As expected, the cost of linear probing grows faster than the cost for“r<strong>and</strong>om” probing.From Figure 9.10 we see th<strong>at</strong> the cost for hashing when the table is not too fullis typically close to one record access. This is extraordinarily efficient, much betterthan binary search which requires log n record accesses. As α increases, so doesthe expected cost. For small values of α, the expected cost is low. It remains belowtwo until the hash table is about half full. When the table is nearly empty, addinga new record to the table does not increase the cost of future search oper<strong>at</strong>ionsby much. However, the additional search cost caused by each additional insertionincreases rapidly once the table becomes half full. Based on this analysis, the ruleof thumb is to design a hashing system so th<strong>at</strong> the hash table never gets above halffull. Beyond th<strong>at</strong> point performance will degrade rapidly. This requires th<strong>at</strong> theimplementor have some idea of how many records are likely to be in the table <strong>at</strong>maximum loading, <strong>and</strong> select the table size accordingly.You might notice th<strong>at</strong> a recommend<strong>at</strong>ion to never let a hash table become morethan half full contradicts the disk-based space/time tradeoff principle, which strivesto minimize disk space to increase inform<strong>at</strong>ion density. Hashing represents an unusualsitu<strong>at</strong>ion in th<strong>at</strong> there is no benefit to be expected from locality of reference.In a sense, the hashing system implementor does everything possible to elimin<strong>at</strong>ethe effects of locality of reference! Given the disk block containing the last record

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!