23.11.2014 Views

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

are too large to fit entirely <strong>in</strong>to <strong>in</strong>ternal memory. In this case, the objective is to solve<br />

the algorithmic problem us<strong>in</strong>g as few block transfers as possible. The most classic<br />

doma<strong>in</strong> for such external-memory algorithms is the sort<strong>in</strong>g problem.<br />

Multi-way Merge-Sort<br />

An efficient way to sort a set S of n objects <strong>in</strong> external memory amounts to a simple<br />

external-memory variation on the familiar merge-sort algorithm. The ma<strong>in</strong> idea<br />

beh<strong>in</strong>d this variation is to merge many recursively sorted lists at a time, thereby<br />

reduc<strong>in</strong>g the number of levels of recursion. Specifically, a high-level description of<br />

this multi-way merge-sort method is to divide S <strong>in</strong>to d subsets S 1 , S 2 , …, S d of<br />

roughly equal size, recursively sort each subset S i , <strong>and</strong> then simultaneously merge<br />

all d sorted lists <strong>in</strong>to a sorted representation of S. If we can perform the merge<br />

process us<strong>in</strong>g only O(n/B) disk transfers, then, for large enough values of n, the<br />

total number of transfers performed by this algorithm satisfies the follow<strong>in</strong>g<br />

recurrence:<br />

t(n) = d · t(n/d) + cn/B,<br />

for some constant c ≥ 1. We can stop the recursion when n ≤ B, s<strong>in</strong>ce we can<br />

perform a s<strong>in</strong>gle block transfer at this po<strong>in</strong>t, gett<strong>in</strong>g all of the objects <strong>in</strong>to <strong>in</strong>ternal<br />

memory, <strong>and</strong> then sort the set with an efficient <strong>in</strong>ternal-memory algorithm. Thus,<br />

the stopp<strong>in</strong>g criterion for t(n) is<br />

t(n) = 1 if n/B≤1.<br />

This implies a closed-form solution that t(n) is O((n/B)logd(n/B)), which is<br />

O((n/B)log(n/B)/logd).<br />

Thus, if we can choose d to be (M/B), then the worst-case number of block<br />

transfers performed by this multi-way merge-sort algorithm will be quite low. We<br />

choose<br />

d = (1/2)M/B.<br />

The only aspect of this algorithm left to specify, then, is how to perform the d-way<br />

merge us<strong>in</strong>g only O(n/B) block transfers.<br />

14.4.1 Multi-way Merg<strong>in</strong>g<br />

We perform the d-way merge by runn<strong>in</strong>g a "tournament." We let Tbe a complete<br />

b<strong>in</strong>ary tree with d external nodes, <strong>and</strong> we keep T entirely <strong>in</strong> <strong>in</strong>ternal memory. We<br />

associate each external node i of T with a different sorted list S i We <strong>in</strong>itialize T by<br />

read<strong>in</strong>g <strong>in</strong>to each external node i, the first object <strong>in</strong> S i . This has the effect of read<strong>in</strong>g<br />

<strong>in</strong>to <strong>in</strong>ternal memory the first block of each sorted list S i . For each <strong>in</strong>ternal-node<br />

905

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!