09.08.2013 Views

Design and Verification of Adaptive Cache Coherence Protocols ...

Design and Verification of Adaptive Cache Coherence Protocols ...

Design and Verification of Adaptive Cache Coherence Protocols ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Example 1: Can both registers r1 <strong>and</strong> r2 obtain 0?<br />

Processor 1 Processor 2<br />

Store( ag1,1) Store( ag2,1)<br />

r1 := Load( ag2) r2 := Load( ag1)<br />

Example 2: With write-bu ers, can both registers r1 <strong>and</strong> r2 obtain 0?<br />

Processor 1 Processor 2<br />

Store( ag1,1) Store( ag2,1)<br />

r3 := Load( ag1) r4 := Load( ag2)<br />

r1 := Load( ag2) r2 := Load( ag1)<br />

Example 3: Can registers r1 <strong>and</strong> r2 obtain 1 <strong>and</strong> 0, respectively?<br />

Processor 1 Processor 2<br />

Store(buf,1) r1 := Load( ag)<br />

Store( ag,1) r2 := Load(buf)<br />

Example 4: Can registers r1 <strong>and</strong> r2 obtain 1 <strong>and</strong> 0, respectively?<br />

Processor 1 Processor 2<br />

Store(buf,1) L: r1 := Load( ag)<br />

Fence Jz(r1,L)<br />

Store( ag,1) r2 := Load(buf)<br />

Figure 1.1: Impact<strong>of</strong>Architectural Optimizations on Program Behaviors<br />

Unfortunately, the extra load instruction would make no semantic di erence in the presence <strong>of</strong><br />

short-circuiting, which allows the load to retrieve thedatafrom the write-bu er that contains<br />

an outst<strong>and</strong>ing store to the address. Example 3 shows a program that implements the producer-<br />

consumer synchronization. In the presence <strong>of</strong> Non-FIFO write-bu ers or non-blocking caches,<br />

it may appear to processor 2 that processor 1 asserts the ag before it writes the new data<br />

to the bu er. The question for Example 4 is whether the branch instruction behaves as an<br />

implicit memory fence. With speculative execution, processor 2 can speculatively perform the<br />

load operation to the bu er before it observes the value <strong>of</strong> the ag.<br />

As a reaction to ever-changing memory models <strong>and</strong> their complicated <strong>and</strong> imprecise de -<br />

nitions, there is a desire to go back to the simple, easy-to-underst<strong>and</strong> sequential consistency,<br />

even though there are a plethora <strong>of</strong> problems in its high-performance implementation <strong>and</strong> no<br />

compiler writer seems to adhere to its semantics. Ingenious solutions have been devised to<br />

maintain the sequential consistency semantics so that programmers cannot detect if <strong>and</strong> when<br />

the memory accesses are out-<strong>of</strong>-order or non-atomic. Recent advances in speculative execution<br />

permit reordering <strong>of</strong> memory accesses without a ecting the sequentiality <strong>of</strong> sequential consis-<br />

tency [48, 51, 125]. However, it is not clear whether such mechanisms are scalable for DSM<br />

systems in which memory access latencies are <strong>of</strong>ten large <strong>and</strong> unpredictable.<br />

1.1.3 Architecture-Oriented Memory Models<br />

Many relaxed memory models have been proposed for DSM systems <strong>and</strong> s<strong>of</strong>tware DSM systems.<br />

Weak consistency [37, 106] assumes that memory accesses to shared variables are guarded by<br />

synchronizations, <strong>and</strong> allows memory accesses between synchronizations to be performed out-<br />

17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!