17.01.2013 Views

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.1 PDM <strong>and</strong> Problem Parameters 11<br />

use block transfer. Figure 1.1 depicts typical memory sizes <strong>and</strong> block<br />

sizes <strong>for</strong> various levels of memory.<br />

Because I/O is done in units of blocks, algorithms can run considerably<br />

faster when the pattern of memory accesses exhibit locality<br />

of reference as opposed to a uni<strong>for</strong>mly r<strong>and</strong>om distribution. However,<br />

even if an application can structure its pattern of memory accesses <strong>and</strong><br />

exploit locality, there is still a substantial access gap between internal<br />

<strong>and</strong> external memory per<strong>for</strong>mance. In fact the access gap is growing,<br />

since the latency <strong>and</strong> b<strong>and</strong>width of memory chips are improving more<br />

quickly than those of disks. Use of parallel processors (or multicores)<br />

further widens the gap. As a result, storage systems such as RAID<br />

deploy multiple disks that can be accessed in parallel in order to get<br />

additional b<strong>and</strong>width [101, 194].<br />

In the next section, we describe the high-level parallel disk model<br />

(PDM), which we use throughout this manuscript <strong>for</strong> the design <strong>and</strong><br />

analysis of EM algorithms <strong>and</strong> data structures. In Section 2.2, we consider<br />

some practical modeling issues dealing with the sizes of blocks <strong>and</strong><br />

tracks <strong>and</strong> the corresponding parameter values in PDM. In Section 2.3,<br />

we review the historical development of models of I/O <strong>and</strong> hierarchical<br />

memory.<br />

2.1 PDM <strong>and</strong> Problem Parameters<br />

We can capture the main properties of magnetic disks <strong>and</strong> multiple disk<br />

systems by the commonly used parallel disk model (PDM) introduced<br />

by Vitter <strong>and</strong> Shriver [345]. The two key mechanisms <strong>for</strong> efficient algorithm<br />

design in PDM are locality of reference (which takes advantage<br />

of block transfer) <strong>and</strong> parallel disk access (which takes advantage of<br />

multiple disks). In a single I/O, each of the D disks can simultaneously<br />

transfer a block of B contiguous data items.<br />

PDM uses the following main parameters:<br />

N = problem size (in units of data items);<br />

M = internal memory size (in units of data items);<br />

B = block transfer size (in units of data items);

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!