23.11.2014 Views

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

each would require to perform the st<strong>and</strong>ard dictionary search <strong>and</strong> update operations.<br />

We refer to this count as the I/O complexity of the algorithms <strong>in</strong>volved.<br />

Some Inefficient External-Memory Dictionaries<br />

Let us first consider the simple dictionary implementations that us a list to store n<br />

entries. If the list is implemented as an unsorted, doubly l<strong>in</strong>ked list, then <strong>in</strong>sert <strong>and</strong><br />

remove can be performed with O(1) transfers each, but removals <strong>and</strong> search<strong>in</strong>g<br />

require n transfers <strong>in</strong> the worst case, s<strong>in</strong>ce each l<strong>in</strong>k hop we perform could access a<br />

different block. This search time can be improved to O(n/B) transfers (see Exercise<br />

C-14.1), where B denotes the number of nodes of the list that can fit <strong>in</strong>to a block,<br />

but this is still poor performance. We could alternately implement the sequence<br />

us<strong>in</strong>g a sorted array. In this case, a search performs O(log 2 n) transfers, via b<strong>in</strong>ary<br />

search, which is a nice improvement. But this solution requires (n/B) transfers<br />

to implement an <strong>in</strong>sert or remove operation <strong>in</strong> the worst case, for we may have to<br />

access all blocks to move elements up or down. Thus, list-based dictionary<br />

implementations are not efficient <strong>in</strong> external memory.<br />

S<strong>in</strong>ce these simple implementations are I/O <strong>in</strong>efficient, we should consider the<br />

logarithmic-time <strong>in</strong>ternal-memory strategies that use balanced b<strong>in</strong>ary trees (for<br />

example, AVL trees or red-black trees) or other search structures with logarithmic<br />

average-case query <strong>and</strong> update times (for example, skip lists or splay trees). These<br />

methods store the dictionary items at the nodes of a b<strong>in</strong>ary tree or of a graph.<br />

Typically, each node accessed for a query or update <strong>in</strong> one of these structures will<br />

be <strong>in</strong> a different block. Thus, these methods all require O(log 2 n) transfers <strong>in</strong> the<br />

worst case to perform a query or update operation. This performance is good, but<br />

we can do better. In particular, we can perform dictionary queries <strong>and</strong> updates us<strong>in</strong>g<br />

only O(log B n) = O(logn/logB) transfers.<br />

14.3.1 (a,b) Trees<br />

To reduce the importance of the performance difference between <strong>in</strong>ternal-memory<br />

accesses <strong>and</strong> external-memory accesses for search<strong>in</strong>g, we can represent our<br />

dictionary us<strong>in</strong>g a multi-way search tree (Section 10.4.1). This approach gives rise<br />

to a generalization of the (2,4) tree data structure known as the (a,b) tree.<br />

An (a, b) tree is a multi-way search tree such that each node has between a <strong>and</strong> b<br />

children <strong>and</strong> stores between a − 1 <strong>and</strong> b − 1 entries. The algorithms for search<strong>in</strong>g,<br />

<strong>in</strong>sert<strong>in</strong>g, <strong>and</strong> remov<strong>in</strong>g entries <strong>in</strong> an (a, b) tree are straightforward generalizations<br />

of the correspond<strong>in</strong>g ones for (2,4) trees. The advantage of generaliz<strong>in</strong>g (2,4) trees<br />

to (a,b) trees is that a generalized class of trees provides a flexible search structure,<br />

where the size of the nodes <strong>and</strong> the runn<strong>in</strong>g time of the various dictionary<br />

operations depends on the parameters a <strong>and</strong> b. By sett<strong>in</strong>g the parameters a <strong>and</strong> b<br />

appropriately with respect to the size of disk blocks, we can derive a data structure<br />

that achieves good external-memory performance.<br />

901

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!