17.01.2013 Views

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

18 Parallel Disk Model (PDM)<br />

architecture-specific. The relative memory sizes <strong>and</strong> block sizes of the<br />

levels vary from computer to computer. Another issue is how blocks<br />

from one memory level are stored in the caches at a lower level. When<br />

a disk block is input into internal memory, it can be stored in any<br />

specified DRAM location. However, in level 1 <strong>and</strong> level 2 caches, each<br />

item can only be stored in certain cache locations, often determined<br />

by a hardware modulus computation on the item’s memory address.<br />

The number of possible storage locations in the cache <strong>for</strong> a given item<br />

is called the level of associativity. Some caches are direct-mapped (i.e.,<br />

with associativity 1), <strong>and</strong> most caches have fairly low associativity (typically<br />

at most 4).<br />

Another reason why the hierarchical models tend to be more<br />

architecture-specific is that the relative difference in speed between<br />

level 1 cache <strong>and</strong> level 2 cache or between level 2 cache <strong>and</strong> DRAM<br />

is orders of magnitude smaller than the relative difference in latencies<br />

between DRAM <strong>and</strong> the disks. Yet, it is apparent that good EM<br />

design principles are useful in developing cache-efficient algorithms. For<br />

example, sequential internal memory access is much faster than r<strong>and</strong>om<br />

access, by about a factor of 10, <strong>and</strong> the more we can build locality into<br />

an algorithm, the faster it will run in practice. By properly engineering<br />

the “inner loops,” a programmer can often significantly speed up the<br />

overall running time. Tools such as simulation environments <strong>and</strong> system<br />

monitoring utilities [221, 294, 322] can provide sophisticated help<br />

in the optimization process.<br />

For reasons of focus, we do not consider hierarchical <strong>and</strong> cache models<br />

in this manuscript. We refer the reader to the previous references<br />

on cache-oblivious algorithms, as well to as the following references:<br />

Aggarwal et al. [20] define an elegant hierarchical memory model, <strong>and</strong><br />

Aggarwal et al. [21] augment it with block transfer capability. Alpern<br />

et al. [29] model levels of memory in which the memory size, block<br />

size, <strong>and</strong> b<strong>and</strong>width grow at uni<strong>for</strong>m rates. Vitter <strong>and</strong> Shriver [346]<br />

<strong>and</strong> Vitter <strong>and</strong> Nodine [344] discuss parallel versions <strong>and</strong> variants<br />

of the hierarchical models. The parallel model of Li et al. [234] also<br />

applies to hierarchical memory. Savage [305] gives a hierarchical pebbling<br />

version of [306]. Carter <strong>and</strong> Gatlin [96] define pebbling models of<br />

nonassociative direct-mapped caches. Rahman <strong>and</strong> Raman [287] <strong>and</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!