09.08.2013 Views

Design and Verification of Adaptive Cache Coherence Protocols ...

Design and Verification of Adaptive Cache Coherence Protocols ...

Design and Verification of Adaptive Cache Coherence Protocols ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

transparent <strong>and</strong> exposed only for low-level memory-mapped input <strong>and</strong> output operations. For<br />

multiprocessor systems, however, these features are anything but transparent. Indeed, a whole<br />

area <strong>of</strong> research has evolved around what view <strong>of</strong> memory should be presented to the program-<br />

mer, the compiler writer, <strong>and</strong> the computer architect.<br />

The essence <strong>of</strong> memory models is the correspondence between each load instruction <strong>and</strong> the<br />

store instruction that supplies the data retrieved by the load. The memory model <strong>of</strong> uniproces-<br />

sor systems is intuitive: a load operation returns the most recent value written to the address,<br />

<strong>and</strong> a store operation binds the value for subsequent load operations. In parallel systems, no-<br />

tions such as \the most recent value" can become ambiguous since multiple processors access<br />

memory concurrently. Therefore, it can be di cult to specify the resulting memory model<br />

precisely at the architecture level [33, 36, 97]. Surveys <strong>of</strong> some well-known memory models can<br />

be found elsewhere [1, 66].<br />

Memory models can be generally categorized as architecture-oriented <strong>and</strong> program-oriented.<br />

One can think <strong>of</strong> an architecture-oriented model as the low-level interface between the compiler<br />

<strong>and</strong> the underlying architecture, while a program-oriented model the high-level interface be-<br />

tween the compiler <strong>and</strong> the program. Many architecture-oriented memory models [61, 86, 117,<br />

121] are direct consequences <strong>of</strong> microarchitecture optimizations such as write-bu ers <strong>and</strong> non-<br />

blocking caches. Every programming language has a high-level memory model [19, 56, 54, 92],<br />

regardless <strong>of</strong> whether it is described explicitly or not. The compiler ensures that the semantics<br />

<strong>of</strong> a program is preserved when its compiled version is executed on an architecture with some<br />

low-level memory model.<br />

1.1.1 Sequential Consistency<br />

Sequential consistency [72] has been the dominant memory model in parallel computing for<br />

decades due to its simplicity. A system is sequentially consistent if the result <strong>of</strong> any execution<br />

is the same as if the operations <strong>of</strong> all the processors were executed in some sequential order,<br />

<strong>and</strong> the operations <strong>of</strong> each individual processor appear in this sequence in the order speci ed<br />

by its program. Sequential consistency requires that memory accesses be performed in-order<br />

on each processor, <strong>and</strong> be atomic with respect to each other. This is clearly at odds with both<br />

instruction reordering <strong>and</strong> data caching.<br />

Sequential consistency inevitably prevents many architecture <strong>and</strong> compiler optimizations.<br />

For example, the architect hastobeconservative in what can be reordered although dynamic<br />

instruction reordering is desirable in the presence <strong>of</strong> unpredictable memory access latencies. The<br />

compiler writer is a ected because parallel compilers <strong>of</strong>ten use existing sequential compilers as<br />

a base, <strong>and</strong> sequential compilers reorder instructions based on conventional data ow analysis.<br />

Thus, any transformation involving instruction reordering either has to be turned o , or at<br />

least requires more sophisticated analysis [70, 109].<br />

The desire to achieve higher performance has led to various relaxed memory models, which<br />

can provide more implementation exibility by exposing optimizing features such as instruc-<br />

15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!