Architecture of Computing Systems (Lecture Notes in Computer ...

Recommendations

Info

16 G. Aşılıoğlu, E.M. Kaya, and O. Ergin rename table, walking forward and backwards on the reorder buffer or waiting until the commit time of the mispredicted branch instruction [2][9]. Each of these schemes have various pros and cons in terms of circuit complexity and misprediction penalty. In this paper we propose a new implementation of the rename table that allows fast recovery on a branch misprediction. We propose to use a FIFO structure for each architectural register and recover the tail pointers when a branch misprediction occurs. Our design is less complex than taking a full checkpoint of the rename table, and has smaller restoration latency when compared to the schemes that walk through the reorder buffer. 2 Register Renaming Register renaming requires a new physical register to be assigned to each and every result producing instruction. Fig. 1 shows an example of how the register renaming is performed. Note that R1 is overwritten by the second ADD instruction which is in fact not dependent on the first ADD. This false dependency is removed by assigning different physical registers to two ADD instructions. ADD R1, R2, R3 SUB R2, R1, R2 ADD R1, R4, R5 ADD P13, P22, P31 SUB P52, P13, P22 ADD P95, P63, P44 Rename Table R1 P95 R2 P52 R3 P21 R4 P63 R5 R44 R6 P6 R7 P7 R8 P8 Fig. 1. Example of Register Renaming The second ADD instruction in Fig. 1 removes its write after write (WAW) and write after read (WAR) dependencies by writing its result to a separate physical register. However, this is not enough to maintain precise execution; any younger instruction that needs to access R1 has to be aware of the fact that the value it needs to read resides in P95. The rename table is used for this purpose; the table includes an entry for each architectural register that holds the location of the most recent instance of the register. Consequently, as the rename table holds the precise state of the processor, any fault on this table will result in a system crash. Ideally the rename table idea works perfectly. Each instruction that produces a result checks the availability of a physical register and allocates a register. Afterwards, the result producing instruction updates the rename table to inform the consumers that will later check this table to obtain the location of their architectural source registers. In superscalar machines, multiple instructions need to be renamed each cycle, which
Complexity-Effective Rename Table Design for Rapid Speculation Recovery 17 mandates multiple updates to the rename table in the same cycle. Multiple updates to the same entry of the rename table are avoided by using a series of comparator circuits to check for dependencies. 3 Speculation Recovery in the Rename Table Although the idea of rename table works fine in regular conditions, some recovery effort is needed when the processor employs some kind of speculation. All of the contemporary microprocessors employ branch prediction to alleviate the negative effects of deep pipelining on processor performance [3][5]. When a branch misprediction occurs, all of the instructions following the branch instruction in the reorder buffer are squashed from the issue queue and the reorder buffer. However this is not enough to recover the precise state as the modifications on the rename table also have to be rolled back to its state just before the faulting branch instruction. Different schemes exist in the literature and are implemented in real processors to recover the rename table on a branch misprediction. This schemes can be divided into two groups depending on their dependence on a retirement rename table (also called the commit rename table). 3.1 Schemes That Use a Separate Commit Rename Table Commit rename table is used to hold the locations of the last committed instances of the architectural registers. Its structure is the same with the speculative rename table; contains one entry for each architectural register. Every instruction that leaves the reorder buffer at the commit stage updates this table with its destination physical register tag. 1. Wait: It is always the safe choice to wait until the faulting branch reaches the top of the reorder buffer. When the branch commits, the commit rename table is copied to the frontend (speculative) rename table. This scheme is not preferred for many branches since a large number of cycles may pass before the branch itself commits. 2. Walk forward: When the misprediction occurs, start from the head of the reorder buffer and pseudo commit all instructions to construct the commit rename table. When the faulting branch is reached, the commit rename table is precise. Afterwards the commit rename table is copied to the frontend rename table. Usually the number of pseudo committed instructions is equal to the commit width of the machine. If the faulting branch is far away from the head of the reorder buffer, branch misprediction penalty increases. 3.2 Schemes without a Separate Commit Rename Table 3. Walk backward: When a misprediction occurs, start from the tail of the reorder buffer and undo the changes of each instruction on the speculative rename table. The number of instructions processed each cycle is equal to the commit width of the machine. If the faulting branch is far away from the tail of the reorder buffer, the misprediction penalty is high. This scheme also requires that every instruction
Page 2 and 3: Lecture Notes in Computer Science 5
Page 4 and 5: Volume Editors Christian Müller-Sc
Page 6 and 7: General Chair Organization Christia
Page 8 and 9: Organization IX Hartmut Schmeck Kar
Page 10 and 11: Keynote Table of Contents HyVM - Hy
Page 12 and 13: Table of Contents XIII JetBench:AnO
Page 14 and 15: How to Enhance a Superscalar Proces
Page 16 and 17: 4 J. Mische et al. The Real-time Vi
Page 18 and 19: 6 J. Mische et al. 4.1 Instruction
Page 20 and 21: 8 J. Mische et al. Additionally the
Page 22 and 23: 10 J. Mische et al. % of unused pip
Page 24 and 25: 12 J. Mische et al. % of cycles spe
Page 26 and 27: 14 J. Mische et al. 16. Lickly, B.,
Page 30 and 31: 18 G. Aşılıoğlu, E.M. Kaya, and
Page 38 and 39: 26 T.B. Preußer, P. Reichel, and R
Page 50 and 51: 38 P. Bellasi, W. Fornaciari, and D
Page 62 and 63: 50 J. Zeppenfeld and A. Herkersdorf
Page 74 and 75: 62 B. Jakimovski, B. Meyer, and E.
Page 76 and 77: 64 B. Jakimovski, B. Meyer, and E.
Page 78 and 79:
66 B. Jakimovski, B. Meyer, and E.
Page 80 and 81:
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
74 M. Bonn and H. Schmeck Fig. 1. J
Page 88 and 89:
76 M. Bonn and H. Schmeck 2.2 Node
Page 90 and 91:
78 M. Bonn and H. Schmeck Uptime-ba
Page 92 and 93:
80 M. Bonn and H. Schmeck 2.4 Simul
Page 94 and 95:
82 M. Bonn and H. Schmeck done rate
Page 96 and 97:
84 M. Bonn and H. Schmeck Fig. 8. J
Page 98 and 99:
86 M. Bonn and H. Schmeck tells the
Page 100 and 101:
88 J.-P. Steghöfer et al. � �
Page 102 and 103:
90 J.-P. Steghöfer et al. mechanis
Page 104 and 105:
92 J.-P. Steghöfer et al. resource
Page 106 and 107:
94 J.-P. Steghöfer et al. Choose a
Page 108 and 109:
96 J.-P. Steghöfer et al. 1. Defin
Page 110 and 111:
98 J.-P. Steghöfer et al. and all
Page 112 and 113:
100 J.-P. Steghöfer et al. 19. Kim
Page 114 and 115:
102 K. Kloch et al. large-scale sys
Page 116 and 117:
104 K. Kloch et al. a�t� 1.0 0.
Page 118 and 119:
106 K. Kloch et al. constant. This
Page 120 and 121:
108 K. Kloch et al. Relative number
Page 122 and 123:
110 K. Kloch et al. (a) infection r
Page 124 and 125:
112 K. Kloch et al. (ii) Phase of a
Page 126 and 127:
114 P. Petoumenos et al. Studying t
Page 128 and 129:
116 P. Petoumenos et al. % of Misse
Page 130 and 131:
118 P. Petoumenos et al. IQ: n 4-in
Page 132 and 133:
120 P. Petoumenos et al. downsizing
Page 134 and 135:
122 P. Petoumenos et al. As long as
Page 136 and 137:
124 P. Petoumenos et al. comparable
Page 138 and 139:
Exploiting Inactive Rename Slots fo
Page 140 and 141:
128 M. Kayaalp et al. In a supersca
Page 142 and 143:
130 M. Kayaalp et al. INSTRUCTION 1
Page 144 and 145:
132 M. Kayaalp et al. time. Alterna
Page 146 and 147:
134 M. Kayaalp et al. Fig. 3. Numbe
Page 148 and 149:
136 M. Kayaalp et al. The results o
Page 150 and 151:
Efficient Transaction Nesting in Ha
Page 152 and 153:
140 Y. Liu et al. HTMs include TCC
Page 154 and 155:
142 Y. Liu et al. rollback T0 begin
Page 156 and 157:
144 Y. Liu et al. Processor core Pr
Page 158 and 159:
146 Y. Liu et al. 5.2 Results and A
Page 160 and 161:
148 Y. Liu et al. decreases, partia
Page 162 and 163:
Decentralized Energy-Management to
Page 164 and 165:
152 B. Becker et al. Furthermore, t
Page 166 and 167:
154 B. Becker et al. An optimizing
Page 168 and 169:
156 B. Becker et al. smart-home man
Page 170 and 171:
158 B. Becker et al. freedom like w
Page 172 and 173:
160 B. Becker et al. Power [W] 4500
Page 174 and 175:
EnergySaving Cluster Roll: Power Sa
Page 176 and 177:
164 M.F. Dolz et al. Themodulequeri
Page 178 and 179:
166 M.F. Dolz et al. This daemon al
Page 180 and 181:
168 M.F. Dolz et al. been submitted
Page 182 and 183:
170 M.F. Dolz et al. On the other h
Page 184 and 185:
172 M.F. Dolz et al. at the inactiv
Page 186 and 187:
Effect of the Degree of Neighborhoo
Page 188 and 189:
176 T. Abdullah et al. A zone based
Page 190 and 191:
178 T. Abdullah et al. A consumer/p
Page 192 and 193:
180 T. Abdullah et al. Messages 140
Page 194 and 195:
182 T. Abdullah et al. (except when
Page 196 and 197:
184 T. Abdullah et al. % Matchmakin
Page 198 and 199:
186 T. Abdullah et al. show that th
Page 200 and 201:
188 M. Schindewolf, D. Kramer, and
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
Page 210 and 211:
Page 212 and 213:
200 R. Plyaskin and A. Herkersdorf
Page 214 and 215:
Page 216 and 217:
Page 218 and 219:
Page 220 and 221:
Page 222 and 223:
Page 224 and 225:
212 M.Y. Qadri, D. Matichard, and K
Page 226 and 227:
Page 228 and 229:
Page 230 and 231:
Page 232 and 233:
Page 234 and 235:
A Tightly Coupled Accelerator Infra
Page 236 and 237:
224 F. Nowak and R. Buchty where A
Page 238 and 239:
226 F. Nowak and R. Buchty Fig. 3.
Page 240 and 241:
228 F. Nowak and R. Buchty Table 2.
Page 242 and 243:
230 F. Nowak and R. Buchty Table 4.
Page 244 and 245:
232 F. Nowak and R. Buchty 5.3 Comp
Page 246 and 247:
Optimizing Stencil Application on M
Page 248 and 249:
236 F. Xudong et al. compared to th
Page 250 and 251:
238 F. Xudong et al. stencil comput
Page 252 and 253:
240 F. Xudong et al. threads in the
Page 254 and 255:
242 F. Xudong et al. Speedup 14 12
Page 256 and 257:
244 F. Xudong et al. 5 Related Work
Page 258:
Abdullah, Tariq 174 Alima, Luc Onan
show all

Architecture of Computing Systems (Lecture Notes in Computer ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?