Algorithms and Data Structures for External Memory
Algorithms and Data Structures for External Memory
Algorithms and Data Structures for External Memory
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.2 Disk Striping <strong>and</strong> Parallelism with Multiple Disks 27<br />
least (log N)/log B N = logB, which is significant in practice, <strong>and</strong> can<br />
be as much as Z/z = B <strong>for</strong> large Z.<br />
4.2 Disk Striping <strong>and</strong> Parallelism with Multiple Disks<br />
It is conceptually much simpler to program <strong>for</strong> the single-disk case<br />
(D = 1) than <strong>for</strong> the multiple-disk case (D ≥ 1). Disk striping [216,<br />
296] is a practical paradigm that can ease the programming task with<br />
multiple disks: When disk striping is used, I/Os are permitted only on<br />
entire stripes, one stripe at a time. The ith stripe, <strong>for</strong> i ≥ 0, consists<br />
of block i from each of the D disks. For example, in the data layout<br />
in Figure 2.3, the DB data items 0–9 comprise stripe 0 <strong>and</strong> can be<br />
accessed in a single I/O step. The net effect of striping is that the D<br />
disks behave as a single logical disk, but with a larger logical block<br />
size DB corresponding to the size of a stripe.<br />
We can thus apply the paradigm of disk striping automatically to<br />
convert an algorithm designed to use a single disk with block size DB<br />
into an algorithm <strong>for</strong> use on D disks each with block size B: In the<br />
single-disk algorithm, each I/O step transmits one block of size DB;<br />
in the D-disk algorithm, each I/O step transmits one stripe, which<br />
consists of D simultaneous block transfers each of size B. The number<br />
of I/O steps in both algorithms is the same; in each I/O step, the DB<br />
items transferred by the two algorithms are identical. Of course, in<br />
terms of wall clock time, the I/O step in the multiple-disk algorithm<br />
will be faster.<br />
Disk striping can be used to get optimal multiple-disk algorithms<br />
<strong>for</strong> three of the four fundamental operations of Chapter 3 — streaming,<br />
online search, <strong>and</strong> answer reporting — but it is nonoptimal <strong>for</strong> sorting.<br />
To see why, consider what happens if we use the technique of disk<br />
striping in conjunction with an optimal sorting algorithm <strong>for</strong> one disk,<br />
such as merge sort [220]. As given in Table 3.1, the optimal number<br />
of I/Os to sort using one disk with block size B is<br />
�<br />
Θ(nlogm n)=Θ n logn<br />
� � �<br />
N log(N/B)<br />
=Θ<br />
. (4.1)<br />
logm B log(M/B)<br />
With disk striping, the number of I/O steps is the same as if we use<br />
a block size of DB in the single-disk algorithm, which corresponds to