09.08.2013 Views

Design and Verification of Adaptive Cache Coherence Protocols ...

Design and Verification of Adaptive Cache Coherence Protocols ...

Design and Verification of Adaptive Cache Coherence Protocols ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Translation from CRF to PSO: The Fencerr <strong>and</strong> Fencerw instructions are translated into<br />

Nop's, since PSO implicitly preserves the load-load <strong>and</strong> load-store ordering. Both Fencewr<br />

<strong>and</strong> Fenceww are translated into Membar.<br />

Translation from CRF to RMO: Since RMO assumes no ordering between memory ac-<br />

cesses unless explicit memory barriers are inserted, the translation scheme translates each<br />

CRF fence into the corresponding RMO memory barrier.<br />

Translation from CRF to IBM 370: Since IBM 370's model is slightly stricter than the<br />

TSO model, the translation can avoid some memory barriers by taking advantage from<br />

the fact that IBM 370 prohibits data short-circuiting from write-bu ers. This suggests<br />

that a Fencewr with identical pre-address <strong>and</strong> post-address be translated into a Nop.<br />

We can translate programs based on release consistency to CRF programs by de ning the<br />

release <strong>and</strong> acquire operations using CRF instructions <strong>and</strong> synchronization instructions such<br />

as Lock/Unlock. The synchronization instructions are provided for mutual exclusion, <strong>and</strong> have<br />

no ordering implication on other instructions. In terms <strong>of</strong> reordering constraints, a Lock can<br />

be treated as a Loadl followed by a Storel, <strong>and</strong> an Unlock can be treated as a Storel.<br />

Release(s) Commit( ) PreFenceW(s) Unlock(s)<br />

Acquire(s) Lock(s) PostFenceR(s) Reconcile( )<br />

One can think <strong>of</strong> the translation scheme above as the de nition <strong>of</strong> one version <strong>of</strong> release<br />

consistency. Obviously, memory accesses after a release can be performed before the semaphore<br />

is released, because the release only imposes a pre-fence on preceding accesses. Similarly, mem-<br />

ory accesses before an acquire do not have to be completed before the semaphore is acquired,<br />

because the acquire only imposes a post-fence on following memory accesses. Furthermore,<br />

modi ed data <strong>of</strong> a store before a release need to be written back to the memory at the release<br />

point, but a data copy <strong>of</strong> the address in another cache does not have to be invalidated or up-<br />

dated immediately. This is because the stale data copy, if any, will be reconciled at the next<br />

acquire point.<br />

3.5 The Generalized CRF Model<br />

In CRF, the memory behaves as the rendezvous <strong>of</strong> saches. A writeback operation is atomic<br />

with respect to all the sites: if a sache can retrieve some data from the memory, another sache<br />

must be able to retrieve the same data from the memory at the same time. Therefore, if two<br />

stores are observed by more than one processor, they are observed in the same order provided<br />

that the loads used in the observation are executed in order.<br />

As an example, the following program illustrates that non-atomic store operations can<br />

generate surprising behaviors even when only one memory location is involved. Assume initially<br />

location a has value 0. Processors A <strong>and</strong> B modify location a while processors C <strong>and</strong> D read<br />

61

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!