17.01.2013 Views

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.2 Disk Striping <strong>and</strong> Parallelism with Multiple Disks 27<br />

least (log N)/log B N = logB, which is significant in practice, <strong>and</strong> can<br />

be as much as Z/z = B <strong>for</strong> large Z.<br />

4.2 Disk Striping <strong>and</strong> Parallelism with Multiple Disks<br />

It is conceptually much simpler to program <strong>for</strong> the single-disk case<br />

(D = 1) than <strong>for</strong> the multiple-disk case (D ≥ 1). Disk striping [216,<br />

296] is a practical paradigm that can ease the programming task with<br />

multiple disks: When disk striping is used, I/Os are permitted only on<br />

entire stripes, one stripe at a time. The ith stripe, <strong>for</strong> i ≥ 0, consists<br />

of block i from each of the D disks. For example, in the data layout<br />

in Figure 2.3, the DB data items 0–9 comprise stripe 0 <strong>and</strong> can be<br />

accessed in a single I/O step. The net effect of striping is that the D<br />

disks behave as a single logical disk, but with a larger logical block<br />

size DB corresponding to the size of a stripe.<br />

We can thus apply the paradigm of disk striping automatically to<br />

convert an algorithm designed to use a single disk with block size DB<br />

into an algorithm <strong>for</strong> use on D disks each with block size B: In the<br />

single-disk algorithm, each I/O step transmits one block of size DB;<br />

in the D-disk algorithm, each I/O step transmits one stripe, which<br />

consists of D simultaneous block transfers each of size B. The number<br />

of I/O steps in both algorithms is the same; in each I/O step, the DB<br />

items transferred by the two algorithms are identical. Of course, in<br />

terms of wall clock time, the I/O step in the multiple-disk algorithm<br />

will be faster.<br />

Disk striping can be used to get optimal multiple-disk algorithms<br />

<strong>for</strong> three of the four fundamental operations of Chapter 3 — streaming,<br />

online search, <strong>and</strong> answer reporting — but it is nonoptimal <strong>for</strong> sorting.<br />

To see why, consider what happens if we use the technique of disk<br />

striping in conjunction with an optimal sorting algorithm <strong>for</strong> one disk,<br />

such as merge sort [220]. As given in Table 3.1, the optimal number<br />

of I/Os to sort using one disk with block size B is<br />

�<br />

Θ(nlogm n)=Θ n logn<br />

� � �<br />

N log(N/B)<br />

=Θ<br />

. (4.1)<br />

logm B log(M/B)<br />

With disk striping, the number of I/O steps is the same as if we use<br />

a block size of DB in the single-disk algorithm, which corresponds to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!