Design and Verification of Adaptive Cache Coherence Protocols ...

More documents

Recommendations

Info

What we are proposing is a mechanism-oriented memory model called Commit-Reconcile & Fences (CRF), which exposes both data replication and instruction reordering at the instruction set architecture level. CRF is intended for architects and compiler writers rather than for high-level parallel programming. One motivation underlying CRF is to eliminate the modele de l'annee aspect of many of the existing relaxed memory models while still permit e cient implementations. The CRF model permits aggressive cache coherence protocols because no operation explicitly or implicitly involves more than one semantic cache. A novel feature of CRF is that many memory models can be expressed as restricted versions of CRF in that programs written under those memory models can be translated into e cient CRFprograms. Translations of programs written under memory models such as sequential consistency and release consistency into CRF programs are straightforward. Parallel programs have various memory access patterns. It is highly desirable that a cache coherence protocol can adapt its actions to changing program behaviors. This thesis attacks the adaptivity problem from a new perspective. We develop an adaptive cache coherence protocol called Cachet that provides a wide scope of adaptivity for DSM systems. The Cachet protocol is a seamless integration of several micro-protocols, each of which has been optimized for a particular memory access pattern. Furthermore, the Cachet protocol implements the CRF model, therefore, it is automatically an implementation for all the memory models whose programs can be translated into CRF programs. Cache coherence protocols can be extremely complicated, especially in the presence of various optimizations. It often takes much more time in verifying the correctness of cache coherence protocols than in designing them, and rigorous reasoning is the only way toavoid subtle errors in sophisticated cache coherence protocols. This is why Term Rewriting Systems (TRSs) is cho- sen as the underlying formalism to specify and verify computer architectures and distributed protocols. We use TRSs to de ne the operational semantics of the CRF memory model so that each CRF program has some well-de ned operational behavior. The set of rewriting rules can be used by both architects and compiler writers to validate their implementations and optimizations. We can prove the soundness of a cache coherence protocol by showing that the TRS specifying the protocol can be simulated by the TRS specifying the memory model. The remainder of this chapter gives some background about memory models and cache coherence protocols. In Section 1.1, we give an overview of memory models, from sequential consistency to some relaxed memory models. Section 1.2 discusses cache coherence protocols and some common veri cation techniques of cache coherence protocols. Section 1.3 is a sum- mary of major contributions of the thesis and outline of the thesis organization. 1.1 Memory Models Caching and instruction reordering are ubiquitous features of modern computer systems and are necessary to achieve high performance. For uniprocessor systems, these features are mostly 14
transparent and exposed only for low-level memory-mapped input and output operations. For multiprocessor systems, however, these features are anything but transparent. Indeed, a whole area of research has evolved around what view of memory should be presented to the program- mer, the compiler writer, and the computer architect. The essence of memory models is the correspondence between each load instruction and the store instruction that supplies the data retrieved by the load. The memory model of uniprocessor systems is intuitive: a load operation returns the most recent value written to the address, and a store operation binds the value for subsequent load operations. In parallel systems, no- tions such as \the most recent value" can become ambiguous since multiple processors access memory concurrently. Therefore, it can be di cult to specify the resulting memory model precisely at the architecture level [33, 36, 97]. Surveys of some well-known memory models can be found elsewhere [1, 66]. Memory models can be generally categorized as architecture-oriented and program-oriented. One can think of an architecture-oriented model as the low-level interface between the compiler and the underlying architecture, while a program-oriented model the high-level interface between the compiler and the program. Many architecture-oriented memory models [61, 86, 117, 121] are direct consequences of microarchitecture optimizations such as write-bu ers and non- blocking caches. Every programming language has a high-level memory model [19, 56, 54, 92], regardless of whether it is described explicitly or not. The compiler ensures that the semantics of a program is preserved when its compiled version is executed on an architecture with some low-level memory model. 1.1.1 Sequential Consistency Sequential consistency [72] has been the dominant memory model in parallel computing for decades due to its simplicity. A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order speci ed by its program. Sequential consistency requires that memory accesses be performed in-order on each processor, and be atomic with respect to each other. This is clearly at odds with both instruction reordering and data caching. Sequential consistency inevitably prevents many architecture and compiler optimizations. For example, the architect hastobeconservative in what can be reordered although dynamic instruction reordering is desirable in the presence of unpredictable memory access latencies. The compiler writer is a ected because parallel compilers often use existing sequential compilers as a base, and sequential compilers reorder instructions based on conventional data ow analysis. Thus, any transformation involving instruction reordering either has to be turned o , or at least requires more sophisticated analysis [70, 109]. The desire to achieve higher performance has led to various relaxed memory models, which can provide more implementation exibility by exposing optimizing features such as instruc- 15
Page 1: CSAIL Computer Science and Artifici
Page 5: Design and Veri cation of Adaptive
Page 8 and 9: I am truly grateful to my parents f
Page 10 and 11: 4 The Base Cache Coherence Protocol
Page 13 and 14: List of Figures 1.1 Impact of Archi
Page 15: Chapter 1 Introduction Shared memor
Page 19 and 20: Example 1: Can both registers r1 an
Page 21 and 22: and release locks, which guard ever
Page 23 and 24: that are interconnected with an o -
Page 25 and 26: although various techniques have be
Page 27 and 28: y showing that each processor can a
Page 29 and 30: s1 if p (s1) ! s2 where s1 and s2 a
Page 31 and 32: Program counter (pc) +1 Instruction
Page 33 and 34: Branch target buffer (btb) Program
Page 35 and 36: instruction is waiting to be dispat
Page 37 and 38: Current State: Proc(ia, rf , rob, b
Page 39 and 40: Rule Name mem pmb mpb Next mem Next
Page 41 and 42: Specification Implementation t 1 B
Page 43 and 44: Intuitively, \2P " means that \P is
Page 45 and 46: (a) (b) PC 1005 PC 2000 Instruction
Page 47 and 48: ewritten to s2 according to rule R.
Page 49 and 50: It can be shown that the relaxed di
Page 51 and 52: Chapter 3 The Commit-Reconcile & Fe
Page 53 and 54: Producer Storel(a,v) Commit(a) Cons
Page 55 and 56: Commit/Reconcile Purge Loadl/Commit
Page 57 and 58: instructions, because of the lack o
Page 59 and 60: CRF-Relaxed-Commit Rule Site(sache,
Page 61 and 62: 3.4 Universality of the CRF Model M
Page 63 and 64: Translation from CRF to PSO: The Fe
Page 65 and 66: proc proc proc pmb mpb pmb mpb pmb
Page 67 and 68:
Chapter 4 The Base Cache Coherence
Page 69 and 70:
can be classi ed into two non-overl
Page 71 and 72:
arbitrarily in an outgoing queue (t
Page 73 and 74:
4.3 The Imperative Rules of the Bas
Page 75 and 76:
Imperative Processor Rules Instruct
Page 77 and 78:
Figure 4.5 de nes the rules of the
Page 79 and 80:
4.5.2 Mapping from Base to CRF We d
Page 81 and 82:
Base Imperative Rule CRF Rule IP1 (
Page 83 and 84:
Base Rule Base Imperative Rule P1 I
Page 85 and 86:
eceives a CacheReq message, it will
Page 87 and 88:
e completed if it has an in nite nu
Page 89 and 90:
SYS Sys(MSITE, SITEs) System MSITE
Page 91 and 92:
Commit/Reconcile C-Receive-WbAck Ru
Page 93 and 94:
message can be resumed eventually.
Page 95 and 96:
If the memory state shows that the
Page 97 and 98:
5.3.3 FIFO Message Passing The live
Page 99 and 100:
Figure 5.7 gives the M-engine rules
Page 101 and 102:
Backward-Message-Cache-to-Mem-for-W
Page 103 and 104:
5.4.3 Simulation of WP in CRF Theor
Page 105 and 106:
WP Imperative Rule CRF Rules IP1 (L
Page 107 and 108:
WP Rule WP Imperative & Directive R
Page 109 and 110:
(4) Msg(H,id ,Cache,a,-)" 2 Cin id(
Page 111 and 112:
This completes the proof according
Page 113 and 114:
emains as CachePending, there mustb
Page 115 and 116:
M-engine Rules Msg from id Mstate A
Page 117 and 118:
Chapter 6 The Migratory Cache Coher
Page 119 and 120:
Commit/Reconcile Send Purge Loadl/C
Page 121 and 122:
Mandatory Processor Rules Instructi
Page 123 and 124:
while the memory sends a FlushReq m
Page 125 and 126:
Lemma 38 D is strongly terminating
Page 127 and 128:
Migratory Imperative Rule CRF Rules
Page 129 and 130:
Cell(a,v,C[id ]) 2 Mem(s) _ Cell(a,
Page 131 and 132:
According to Theorem-C and Lemma 45
Page 133 and 134:
CacheReq message. In the latter cas
Page 135 and 136:
Chapter 7 Cachet: A Seamless Integr
Page 137 and 138:
7.1.1 Putting Things Together It is
Page 139 and 140:
Writeback Operations In Cachet, a w
Page 141 and 142:
Composite Message Equivalent Sequen
Page 143 and 144:
Imperative Processor Rules Instruct
Page 145 and 146:
Invalid Commit/Reconcile Receive Ca
Page 147 and 148:
Composite Imperative C-engine Rules
Page 149 and 150:
7.4 The Cachet Cache Coherence Prot
Page 151 and 152:
Voluntary C-engine Rules Cstate Act
Page 153 and 154:
The memory processes an incoming Ca
Page 155 and 156:
The inaccuracy of memory states can
Page 157 and 158:
C-engine Rule of Cachet Deriving Im
Page 159 and 160:
Composite Mandatory Processor Rules
Page 161 and 162:
memory regions simultaneously. In a
Page 163 and 164:
Chapter 8 Conclusions This thesis h
Page 165 and 166:
and heuristic policies. Mandatory r
Page 167 and 168:
technique can be used to allow a ca
Page 169 and 170:
Voluntary C-engine Rules Cstate Act
Page 171 and 172:
FIFO Message Passing msg1 msg2 msg2
Page 173 and 174:
[13] J. K. Archibald. The Cache Coh
Page 175 and 176:
[41] A. Erlichson, N. Nuckolls, G.
Page 177 and 178:
[71] J. Kuskin, D. Ofelt, M. Heinri
Page 179 and 180:
[101] F. Pong and M. Dubois. A New
show all

Design and Verification of Adaptive Cache Coherence Protocols ...

Create successful ePaper yourself

Delete template?

Save as template?