17.01.2013 Views

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

22 Fundamental I/O Operations <strong>and</strong> Bounds<br />

Table 3.1 I/O bounds <strong>for</strong> the four fundamental operations. The PDM parameters are<br />

defined in Section 2.1.<br />

Operation I/O bound, D = 1 I/O bound, general D ≥ 1<br />

Scan(N) Θ<br />

Sort(N)<br />

� �<br />

� �<br />

N<br />

N<br />

�<br />

n<br />

�<br />

=Θ(n) Θ =Θ<br />

B<br />

DB D<br />

�<br />

N<br />

Θ<br />

B log �<br />

N<br />

M/B<br />

B<br />

=Θ(nlog m n)<br />

�<br />

N<br />

Θ<br />

DB log �<br />

N<br />

M/B<br />

B<br />

�<br />

n<br />

=Θ<br />

D logm n<br />

�<br />

Search(N) Θ(log B N) Θ(log DB N)<br />

Output(Z)<br />

� �<br />

Θ max 1, Z<br />

��<br />

B<br />

=Θ � max{1,z} �<br />

� �<br />

Θ max 1, Z<br />

��<br />

DB<br />

� �<br />

=Θ max 1, z<br />

��<br />

D<br />

The first two of these I/O bounds — Scan(N) <strong>and</strong> Sort(N) —<br />

apply to batched problems. The last two I/O bounds — Search(N)<br />

<strong>and</strong> Output(Z) — apply to online problems <strong>and</strong> are typically combined<br />

together into the <strong>for</strong>m Search(N) +Output(Z). As mentioned in<br />

Section 2.1, some batched problems also involve queries, in which case<br />

the I/O bound Output(Z) may be relevant to them as well. In some<br />

pipelined contexts, the Z items in an answer to a query do not need to<br />

be output to the disks but rather can be “piped” to another process, in<br />

which case there is no I/O cost <strong>for</strong> output. Relational database queries<br />

are often processed in such a pipeline fashion. For simplicity, in this<br />

manuscript we explicitly consider the output cost <strong>for</strong> queries.<br />

The I/O bound Scan(N) =O(n/D), which is clearly required to<br />

read or write a file of N items, represents a linear number of I/Os in the<br />

PDM model. An interesting feature of the PDM model is that almost<br />

all nontrivial batched problems require a nonlinear number of I/Os,<br />

even those that can be solved easily in linear CPU time in the (internal<br />

memory) RAM model. Examples we discuss later include permuting,<br />

transposing a matrix, list ranking, <strong>and</strong> several combinatorial graph<br />

problems. Many of these problems are equivalent in I/O complexity to<br />

permuting or sorting.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!