17.01.2013 Views

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

120 Dynamic <strong>and</strong> Kinetic <strong>Data</strong> <strong>Structures</strong><br />

involved. In this section, we look at a technique that is based upon<br />

the properties of the problem itself rather than upon that of the data<br />

structure.<br />

We call a problem decomposable if we can answer a query by querying<br />

individual subsets of the problem data <strong>and</strong> then computing the<br />

final result from the solutions to each subset. Dictionary search <strong>and</strong><br />

range searching are obvious examples of decomposable problems. Bentley<br />

developed the logarithmic method [83, 278] to convert efficient static<br />

data structures <strong>for</strong> decomposable problems into general dynamic ones.<br />

In the internal memory setting, the logarithmic method consists of<br />

maintaining a series of static substructures, at most one each of sizes<br />

1, 2, 4, 8, . . . . When a new item is inserted, it is initialized in a substructure<br />

of size 1. If a substructure of size 1 already exists, the two<br />

substructures are combined into a single substructure of size 2. If there<br />

is already a substructure of size 2, they in turn are combined into a<br />

single substructure of size 4, <strong>and</strong> so on. For the current value of N, itis<br />

easy to see that the kth substructure (i.e., of size 2 k ) is present exactly<br />

when the kth bit in the binary representation of N is 1. Since there are<br />

at most log N substructures, the search time bound is logN times the<br />

search time per substructure. As the number of items increases from 1<br />

to N, the kth structure is built a total of N/2 k times (assuming N is a<br />

power of 2). If it can be built in O(2 k ) time, the total time <strong>for</strong> all insertions<br />

<strong>and</strong> all substructures is thus O(N logN), making the amortized<br />

insertion time O(logN). If we use up to three substructures of size 2 k<br />

at a time, we can do the reconstructions in advance <strong>and</strong> convert the<br />

amortized update bounds to worst-case [278].<br />

In the EM setting, in order to eliminate the dependence upon the<br />

binary logarithm in the I/O bounds, the number of substructures must<br />

be reduced from logN to log B N, <strong>and</strong> thus the maximum size of the kth<br />

substructure must be increased from 2 k to B k . As the number of items<br />

increases from 1 to N, the kth substructure has to be built NB/B k<br />

times (when N isapowerofB), each time taking O � B k (log B N)/B �<br />

I/Os. The key point is that the extra factor of B in the numerator of<br />

the first term is cancelled by the factor of B in the denominator of<br />

the second term, <strong>and</strong> thus the resulting total insertion time over all N<br />

insertions <strong>and</strong> all log B N structures is O � N(log B N) 2� I/Os, which is

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!