Design and Verification of Adaptive Cache Coherence Protocols ...

More documents

Recommendations

Info

speculative instruction and all the instructions issued thereafter are abandoned, and their e ect on the processor state nulli ed. The BTB is updated according to some prediction scheme after each branch resolution. The speculative processor's correctness is not contingent uponhow the BTB is maintained, as long as the program counter can be set to the correct value after a misprediction. How- ever, di erent prediction schemes can give rise to very di erent misprediction rates and thus profoundly in uence performance. Generally, we assume that the BTB produces the correct next instruction address for all non-branch instructions. Any processor permitting speculative execution must ensure that a speculative instruction does not modify the programmer-visible state until it can be \committed". Alternatively, it must save enough of the processor state so that the correct state can be restored in case the speculation turns out to be wrong. Most implementations use a combination of these two ideas: speculative instructions do not modify the register le or memory until it can be determined that the prediction is correct, but they may update the program counter. Both the current and the speculated instruction address are recorded. Thus, speculation correctness can be determined later, and the correct program counter can be restored in case of a wrong prediction. Typically, all the temporary state is maintained in the ROB itself. 2.4 The PS Speculative Processor We now present the rules for a simpli ed micro-architecture that does register renaming and speculative execution. We achieve this simpli cation by not showing all the pipelining and not giving the details of some hardware operations. The memory system is modeled as operating asynchronously with respect to the processor. Thus, memory instructions in the ROB are dispatched to the memory system via an ordered processor-to-memory bu er (pmb) the memory provides its responses via a memory-to-processor bu er (mpb). Memory system details can be added in a modular fashion without changing the processor description. We need to add two new components, rob and btb, to the processor state, corresponding to ROB and BTB. The reorder bu er is a complex device to model because di erent types of operations need to be performed on it. It can be thought of as a FIFO queue that is initially empty (). We use constructor ` ', which is associative but not commutative, to represent this aspect of rob. It can also be considered as an array of instruction templates, with an array index serving as a renaming tag. It is well known that a FIFO queue can be implemented as a circular bu er using two pointers into an array. We will hide these implementation details of rob and assume that the next available tag can be obtained. An instruction template in rob contains the instruction address, opcode, operands and some extra information needed to complete the instruction. For instructions that need to update a register, the Wr(r) eld records the destination register r. For branch instructions, the Sp(pia) eld holds the predicted instruction address pia, which will be used to determine the prediction's correctness. Each memory access instruction maintains an extra ag to indicate whether the 32
instruction is waiting to be dispatched (U), or has been dispatched to the memory (D). The memory system returns a value for a load and an acknowledgment (Ack) for a store. We have taken some syntactic liberties in expressing various types of instruction templates below: ROB Entry Itb(ia,t :=v,Wr(r)) [] Itb(ia,t :=Op(tv1,tv2),Wr(r)) [] Itb(ia,Jz(tv1,tv2),Sp(pia)) [] Itb(ia,t :=Load(tv1,mf ),Wr(r)) [] Itb(ia,t :=Store(tv1,tv2,mf )) where tv stands for either a tag or a value, and the memory ag mf is either U or D. The tag used in the store instruction template is intended to provide some exibility in coordinating with the memory system and does not imply any register updating. 2.4.1 Instruction Fetch Rules Each time the processor issues an instruction, the program counter is set to the address of the next instruction to be issued. For non-branch instructions, the program counter is simply incremented by one. Speculative execution occurs when a Jz instruction is issued: the program counter is then set to the instruction address obtained by consulting the btb entry corresponding to the Jz instruction's address. When the processor issues an instruction, an instruction template is allocated in the rob. If the instruction is to modify a register, we use an unused renaming tag (typically the index of the rob slot) to rename the destination register, and the destination register is recorded in the Wr eld. The tag or value of each operand register is found by searching the rob from the youngest (rightmost) bu er to the oldest (leftmost) bu er until an instruction template containing the referenced register is found. If no such bu er exists in the rob, then the most up-to-date value resides in the register le. The following lookup procedure captures this idea: lookup(r, rf , rob) rf [r] if Wr(r) =2 rob lookup(r, rf , rob1 Itb(ia,t :=-,Wr(r)) rob2) t if Wr(r) =2 rob2 It is beyond the scope of our discussion to give a hardware implementation of this procedure, but it is certainly possible to do so using TRSs. Any implementation that canlookupvalues in the rob using a combinational circuit will su ce. Fetch-Loadc Rule Proc(ia, rf , rob, btb, im) if im[ia] = r:=Loadc(v) ! Proc(ia+1, rf , rob Itb(ia,t :=v,Wr(r)), btb, im) Fetch-Loadpc Rule Proc(ia, rf , rob, btb, im) if im[ia] = r:=Loadpc ! Proc(ia+1, rf , rob Itb(ia,t :=ia,Wr(r)), btb, im) 33
Page 1: CSAIL Computer Science and Artifici
Page 5: Design and Veri cation of Adaptive
Page 8 and 9: I am truly grateful to my parents f
Page 10 and 11: 4 The Base Cache Coherence Protocol
Page 13 and 14: List of Figures 1.1 Impact of Archi
Page 15 and 16: Chapter 1 Introduction Shared memor
Page 17 and 18: transparent and exposed only for lo
Page 19 and 20: Example 1: Can both registers r1 an
Page 21 and 22: and release locks, which guard ever
Page 23 and 24: that are interconnected with an o -
Page 25 and 26: although various techniques have be
Page 27 and 28: y showing that each processor can a
Page 29 and 30: s1 if p (s1) ! s2 where s1 and s2 a
Page 31 and 32: Program counter (pc) +1 Instruction
Page 33: Branch target buffer (btb) Program
Page 37 and 38: Current State: Proc(ia, rf , rob, b
Page 39 and 40: Rule Name mem pmb mpb Next mem Next
Page 41 and 42: Specification Implementation t 1 B
Page 43 and 44: Intuitively, \2P " means that \P is
Page 45 and 46: (a) (b) PC 1005 PC 2000 Instruction
Page 47 and 48: ewritten to s2 according to rule R.
Page 49 and 50: It can be shown that the relaxed di
Page 51 and 52: Chapter 3 The Commit-Reconcile & Fe
Page 53 and 54: Producer Storel(a,v) Commit(a) Cons
Page 55 and 56: Commit/Reconcile Purge Loadl/Commit
Page 57 and 58: instructions, because of the lack o
Page 59 and 60: CRF-Relaxed-Commit Rule Site(sache,
Page 61 and 62: 3.4 Universality of the CRF Model M
Page 63 and 64: Translation from CRF to PSO: The Fe
Page 65 and 66: proc proc proc pmb mpb pmb mpb pmb
Page 67 and 68: Chapter 4 The Base Cache Coherence
Page 69 and 70: can be classi ed into two non-overl
Page 71 and 72: arbitrarily in an outgoing queue (t
Page 73 and 74: 4.3 The Imperative Rules of the Bas
Page 75 and 76: Imperative Processor Rules Instruct
Page 77 and 78: Figure 4.5 de nes the rules of the
Page 79 and 80: 4.5.2 Mapping from Base to CRF We d
Page 81 and 82: Base Imperative Rule CRF Rule IP1 (
Page 83 and 84: Base Rule Base Imperative Rule P1 I
Page 85 and 86:
eceives a CacheReq message, it will
Page 87 and 88:
e completed if it has an in nite nu
Page 89 and 90:
SYS Sys(MSITE, SITEs) System MSITE
Page 91 and 92:
Commit/Reconcile C-Receive-WbAck Ru
Page 93 and 94:
message can be resumed eventually.
Page 95 and 96:
If the memory state shows that the
Page 97 and 98:
5.3.3 FIFO Message Passing The live
Page 99 and 100:
Figure 5.7 gives the M-engine rules
Page 101 and 102:
Backward-Message-Cache-to-Mem-for-W
Page 103 and 104:
5.4.3 Simulation of WP in CRF Theor
Page 105 and 106:
WP Imperative Rule CRF Rules IP1 (L
Page 107 and 108:
WP Rule WP Imperative & Directive R
Page 109 and 110:
(4) Msg(H,id ,Cache,a,-)" 2 Cin id(
Page 111 and 112:
This completes the proof according
Page 113 and 114:
emains as CachePending, there mustb
Page 115 and 116:
M-engine Rules Msg from id Mstate A
Page 117 and 118:
Chapter 6 The Migratory Cache Coher
Page 119 and 120:
Commit/Reconcile Send Purge Loadl/C
Page 121 and 122:
Mandatory Processor Rules Instructi
Page 123 and 124:
while the memory sends a FlushReq m
Page 125 and 126:
Lemma 38 D is strongly terminating
Page 127 and 128:
Migratory Imperative Rule CRF Rules
Page 129 and 130:
Cell(a,v,C[id ]) 2 Mem(s) _ Cell(a,
Page 131 and 132:
According to Theorem-C and Lemma 45
Page 133 and 134:
CacheReq message. In the latter cas
Page 135 and 136:
Chapter 7 Cachet: A Seamless Integr
Page 137 and 138:
7.1.1 Putting Things Together It is
Page 139 and 140:
Writeback Operations In Cachet, a w
Page 141 and 142:
Composite Message Equivalent Sequen
Page 143 and 144:
Imperative Processor Rules Instruct
Page 145 and 146:
Invalid Commit/Reconcile Receive Ca
Page 147 and 148:
Composite Imperative C-engine Rules
Page 149 and 150:
7.4 The Cachet Cache Coherence Prot
Page 151 and 152:
Voluntary C-engine Rules Cstate Act
Page 153 and 154:
The memory processes an incoming Ca
Page 155 and 156:
The inaccuracy of memory states can
Page 157 and 158:
C-engine Rule of Cachet Deriving Im
Page 159 and 160:
Composite Mandatory Processor Rules
Page 161 and 162:
memory regions simultaneously. In a
Page 163 and 164:
Chapter 8 Conclusions This thesis h
Page 165 and 166:
and heuristic policies. Mandatory r
Page 167 and 168:
technique can be used to allow a ca
Page 169 and 170:
Voluntary C-engine Rules Cstate Act
Page 171 and 172:
FIFO Message Passing msg1 msg2 msg2
Page 173 and 174:
[13] J. K. Archibald. The Cache Coh
Page 175 and 176:
[41] A. Erlichson, N. Nuckolls, G.
Page 177 and 178:
[71] J. Kuskin, D. Ofelt, M. Heinri
Page 179 and 180:
[101] F. Pong and M. Dubois. A New
show all

Design and Verification of Adaptive Cache Coherence Protocols ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?