Design and Verification of Adaptive Cache Coherence Protocols ...

More documents

Recommendations

Info

for computer architecture is the development of a set of integrated design tools for modeling, speci cation, veri cation, simulation and synthesis of computer systems. 8.1 Future Work CRF Microprocessors We have used CRF as the speci cation for cache coherence protocols. The CRF model can be implemented on most modern microprocessors via appropriate translation schemes. However, it remains an open question how CRF instructions can be e ectively incorporated into modern microprocessors. For example, what is the proper granularity for commit, reconcile and fence instructions? What optimizations can be performed by the compiler to eliminate unnecessary synchronizations? Since ordinary load and store instructions are decomposed into ner-grain instructions, the instruction bandwidth needed to support a certain level of performance is likely to be high. This can have profound impact on micro- architectures such as instruction dispatch, cache state access and cache snoopy mechanism. Another interesting question is the implementation of CRF instructions on architectures with malleable caches such as column and curious caching [29]. Optimizations of Cachet The Cachet protocol can be extended in many aspects to incor- porate more adaptivity. For example, in Cachet, an instruction is always stalled when the cache cell is in a transient state. This constraint can be relaxed under certain circumstances: a Loadl instruction can complete if the cache state is WbPending, and a Commit instruction can complete if the cache state is CachePending. The Cachet protocol uses a general cache request that draws no distinction between di erent micro-protocols. Although a cache can indicate what copy it prefers as heuristic information, the memory decides what copy to supply to the cache. We can extend the protocol so that in addition to the general cache request, a cache can also send a speci c cache request for a speci c type of cache copy. This can be useful when caches have more knowledge than the memory about the access patterns of the program. Another advantage of having distinct cache requests is that a cache can send a request for a WP or Migratory copy while the address is cached in some Base state. In this case, the cache request behaves as an upgrade request from Base to WP or Migratory. It is worth noting that Cachet does not allow acache to request an upgrade operation from WP to Migratory instead the cache must rst downgrade the cell from WP to Base and then send a cache request to the memory (although the downgrade message can be piggybacked with the cache request). We can introduce an upgrade request message so that a cache can upgrade a WP cell to a Migratory cell without rst performing the downgrade operation (so that the memory does not need to send the data copy tothe cache). In Cachet, a cache can only receive a data copy from the memory, even though the most up- to-date data resides in another cache at the time. Therefore, a Migratory copy must be written back to the memory rst before the data can be supplied to another cache. The forwarding 164
technique can be used to allow a cache to retrieve a data copy directly from another cache. This can reduce the latency to service cache misses for programs that exhibit access patterns such as the producer-consumer pattern. The Cachet protocol is designed for NUMA systems. It can be extended with COMA-like coherence operations to provide more adaptivity. This allows a cache to switch between NUMA and COMA styles for the same memory region dynamically. Heuristic Policies The Cachet protocol provides enormous adaptivity for programs with various access patterns. Arelevant question is what mechanisms and heuristic policies are needed to discover the access patterns and how appropriate heuristic information can be conveyed to protocol engines. Access patterns can be detected through compiler analysis or runtime statis- tic collection. The Cachet protocol de nes a framework in which various heuristic policies can be examined while the correctness of the protocol is always guaranteed. Customized protocols can be built dynamically with guaranteed soundness and liveness. Access patterns can also be given by the programmer as program annotations. The voluntary rules of Cachet represent a set of coherence primitives that can be safely invoked by programmers whenever necessary. Programmers can therefore build application speci c protocols by selecting appropriate coherence primitives. The primitive selection is just a performance issue, and the correctness of the system can never be compromised, regardless of when and how the primitives are executed. Automatic Veri cation and Synthesis of Protocols When a system or protocol has many rewriting rules, the correctness proofs can quickly become tedious and error-prone. This problem can be alleviated by the use of theorem provers and model checkers. We are currently using theorem provers such as PVS [95, 108] in our veri cation e ort of sophisticated protocols such as the complete Cachet protocol. Theorem provers are usually better at proving things correct than at nding and diagnosing errors. Therefore, it can also be useful to be able to do initial \sanity checking" using nite-state veri ers such as Murphi [35] or SPIN [59]. This often requires scaling down the example so that it has a small number of nite-state processes. TRS descriptions, augmented with proper information about the system building blocks, hold the promise of high-level synthesis. A TRS compiler [58] compiles high-level behavioral descriptions in TRSs into Verilog that can be simulated and synthesized using commercial tools. This can e ectively reduce the hardware design hurdle by allowing direct synthesis of TRS descriptions. We are currently exploring hardware synthesis of cache coherence protocols based on their TRS speci cations. 165
Page 1:
CSAIL Computer Science and Artifici
Page 5:
Design and Veri cation of Adaptive
Page 8 and 9:
I am truly grateful to my parents f
Page 10 and 11:
4 The Base Cache Coherence Protocol
Page 13 and 14:
List of Figures 1.1 Impact of Archi
Page 15 and 16:
Chapter 1 Introduction Shared memor
Page 17 and 18:
transparent and exposed only for lo
Page 19 and 20:
Example 1: Can both registers r1 an
Page 21 and 22:
and release locks, which guard ever
Page 23 and 24:
that are interconnected with an o -
Page 25 and 26:
although various techniques have be
Page 27 and 28:
y showing that each processor can a
Page 29 and 30:
s1 if p (s1) ! s2 where s1 and s2 a
Page 31 and 32:
Program counter (pc) +1 Instruction
Page 33 and 34:
Branch target buffer (btb) Program
Page 35 and 36:
instruction is waiting to be dispat
Page 37 and 38:
Current State: Proc(ia, rf , rob, b
Page 39 and 40:
Rule Name mem pmb mpb Next mem Next
Page 41 and 42:
Specification Implementation t 1 B
Page 43 and 44:
Intuitively, \2P " means that \P is
Page 45 and 46:
(a) (b) PC 1005 PC 2000 Instruction
Page 47 and 48:
ewritten to s2 according to rule R.
Page 49 and 50:
It can be shown that the relaxed di
Page 51 and 52:
Chapter 3 The Commit-Reconcile & Fe
Page 53 and 54:
Producer Storel(a,v) Commit(a) Cons
Page 55 and 56:
Commit/Reconcile Purge Loadl/Commit
Page 57 and 58:
instructions, because of the lack o
Page 59 and 60:
CRF-Relaxed-Commit Rule Site(sache,
Page 61 and 62:
3.4 Universality of the CRF Model M
Page 63 and 64:
Translation from CRF to PSO: The Fe
Page 65 and 66:
proc proc proc pmb mpb pmb mpb pmb
Page 67 and 68:
Chapter 4 The Base Cache Coherence
Page 69 and 70:
can be classi ed into two non-overl
Page 71 and 72:
arbitrarily in an outgoing queue (t
Page 73 and 74:
4.3 The Imperative Rules of the Bas
Page 75 and 76:
Imperative Processor Rules Instruct
Page 77 and 78:
Figure 4.5 de nes the rules of the
Page 79 and 80:
4.5.2 Mapping from Base to CRF We d
Page 81 and 82:
Base Imperative Rule CRF Rule IP1 (
Page 83 and 84:
Base Rule Base Imperative Rule P1 I
Page 85 and 86:
eceives a CacheReq message, it will
Page 87 and 88:
e completed if it has an in nite nu
Page 89 and 90:
SYS Sys(MSITE, SITEs) System MSITE
Page 91 and 92:
Commit/Reconcile C-Receive-WbAck Ru
Page 93 and 94:
message can be resumed eventually.
Page 95 and 96:
If the memory state shows that the
Page 97 and 98:
5.3.3 FIFO Message Passing The live
Page 99 and 100:
Figure 5.7 gives the M-engine rules
Page 101 and 102:
Backward-Message-Cache-to-Mem-for-W
Page 103 and 104:
5.4.3 Simulation of WP in CRF Theor
Page 105 and 106:
WP Imperative Rule CRF Rules IP1 (L
Page 107 and 108:
WP Rule WP Imperative & Directive R
Page 109 and 110:
(4) Msg(H,id ,Cache,a,-)" 2 Cin id(
Page 111 and 112:
This completes the proof according
Page 113 and 114:
emains as CachePending, there mustb
Page 115 and 116: M-engine Rules Msg from id Mstate A
Page 117 and 118: Chapter 6 The Migratory Cache Coher
Page 119 and 120: Commit/Reconcile Send Purge Loadl/C
Page 121 and 122: Mandatory Processor Rules Instructi
Page 123 and 124: while the memory sends a FlushReq m
Page 125 and 126: Lemma 38 D is strongly terminating
Page 127 and 128: Migratory Imperative Rule CRF Rules
Page 129 and 130: Cell(a,v,C[id ]) 2 Mem(s) _ Cell(a,
Page 131 and 132: According to Theorem-C and Lemma 45
Page 133 and 134: CacheReq message. In the latter cas
Page 135 and 136: Chapter 7 Cachet: A Seamless Integr
Page 137 and 138: 7.1.1 Putting Things Together It is
Page 139 and 140: Writeback Operations In Cachet, a w
Page 141 and 142: Composite Message Equivalent Sequen
Page 143 and 144: Imperative Processor Rules Instruct
Page 145 and 146: Invalid Commit/Reconcile Receive Ca
Page 147 and 148: Composite Imperative C-engine Rules
Page 149 and 150: 7.4 The Cachet Cache Coherence Prot
Page 151 and 152: Voluntary C-engine Rules Cstate Act
Page 153 and 154: The memory processes an incoming Ca
Page 155 and 156: The inaccuracy of memory states can
Page 157 and 158: C-engine Rule of Cachet Deriving Im
Page 159 and 160: Composite Mandatory Processor Rules
Page 161 and 162: memory regions simultaneously. In a
Page 163 and 164: Chapter 8 Conclusions This thesis h
Page 165: and heuristic policies. Mandatory r
Page 169 and 170: Voluntary C-engine Rules Cstate Act
Page 171 and 172: FIFO Message Passing msg1 msg2 msg2
Page 173 and 174: [13] J. K. Archibald. The Cache Coh
Page 175 and 176: [41] A. Erlichson, N. Nuckolls, G.
Page 177 and 178: [71] J. Kuskin, D. Ofelt, M. Heinri
Page 179 and 180: [101] F. Pong and M. Dubois. A New
show all

Design and Verification of Adaptive Cache Coherence Protocols ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?