21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Orig<strong>in</strong>al Code<br />

A Beq R1, R4, LABEL<br />

B Sub R2, R2, R3<br />

C Div R3, R2, R1<br />

D Add R2, R2, R3<br />

E Mult R3, R2, R3<br />

Turbo-ROB: A Low Cost Checkpo<strong>in</strong>t/Restore Accelerator 263<br />

Renamed Code<br />

F Beq R4, R3, LABEL2 F Beq P4, P8, LABEL2<br />

G Div R1, R2, R3<br />

G Div P9, P7, P8<br />

H Add R2, R1, R1 H Add P10, P9, P9<br />

Architectural Dest<strong>in</strong>ation Register<br />

Previous RAT Map<br />

A Beq P1, P4, LABEL<br />

B Sub P5, P2, P3<br />

C Div P6, P5, P1<br />

D Add P7, P5, P6<br />

E Mult P8, P7, P6<br />

0 (A) 1 (B) 2 (C)<br />

ROB<br />

3 (D) 4 (E)<br />

none<br />

none<br />

ROB Po<strong>in</strong>ter<br />

Architectural Dest<strong>in</strong>ation Register<br />

Previous RAT Map<br />

R2<br />

P2<br />

R2<br />

P2<br />

R3<br />

P3<br />

R3<br />

P3<br />

Orig<strong>in</strong>al<br />

RAT<br />

R2<br />

P5<br />

R1<br />

P1<br />

R3<br />

P6<br />

R2<br />

P7<br />

5 (F)<br />

none<br />

none<br />

0 1<br />

TROB<br />

2 3 4<br />

1 2 6 7<br />

6 (G)<br />

R1<br />

P1<br />

TROB modify flag (TM) 0 1 1 0 0 0 1<br />

0 (A) 1 (B) 2 (C)<br />

PDRL<br />

3 (D) 4 (E)<br />

Physical Dest<strong>in</strong>ation Register none P5 P6 P7 P8<br />

R1<br />

R2<br />

R3<br />

R4<br />

P1<br />

P2<br />

P3<br />

P4<br />

RAT after<br />

decod<strong>in</strong>g H<br />

5 (F)<br />

none<br />

R1<br />

R2 P10<br />

R3<br />

R4<br />

P9<br />

P8<br />

P4<br />

6 (G)<br />

P9<br />

7 (H)<br />

R2<br />

P7<br />

1<br />

ARB after<br />

decod<strong>in</strong>g H<br />

R1 1<br />

R2 1<br />

R3 0<br />

R4 0<br />

8<br />

7 (H) 8<br />

P10<br />

Fig. 4. Example of recovery us<strong>in</strong>g the Turbo-ROB: repair po<strong>in</strong>ts are <strong>in</strong>itiated at<br />

<strong>in</strong>structions A and F. Recover<strong>in</strong>g at A requires travers<strong>in</strong>g entries 3 to 0 <strong>in</strong> the TROB.<br />

The physical register free list is recovered by travers<strong>in</strong>g entries 7 to 0 <strong>in</strong> the PDRL.<br />

po<strong>in</strong>t. We then complete the recovery us<strong>in</strong>g the ROB, start<strong>in</strong>g at the <strong>in</strong>struction<br />

correspond<strong>in</strong>g to the repair po<strong>in</strong>t at which partial recovery took place. If no such<br />

repair po<strong>in</strong>t is found, we rely solely the ROB for recovery.<br />

Reclaim<strong>in</strong>g Physical Registers: S<strong>in</strong>ce the TROB conta<strong>in</strong>s only first updates,<br />

it can’t be used to free all the physical registers allocated by wrong path<br />

<strong>in</strong>structions. We assume that the free register list (FRL) conta<strong>in</strong>s embedded<br />

checkpo<strong>in</strong>ts which are allocated at repair po<strong>in</strong>ts. The free register list is typically<br />

implemented as a bit vector with one bit per register. Accord<strong>in</strong>gly, checkpo<strong>in</strong>ts<br />

similar to those used for the RAT can be used here also. S<strong>in</strong>ce the FRL is<br />

a unidimensional structure embedd<strong>in</strong>g checkpo<strong>in</strong>ts does not impact its latency<br />

greatly as it does <strong>in</strong> the RAT. Upon commit, an <strong>in</strong>struction free<strong>in</strong>g a register,<br />

must mark it as free <strong>in</strong> the current FRL and <strong>in</strong> all active checkpo<strong>in</strong>ts by clear<strong>in</strong>g<br />

all correspond<strong>in</strong>g bits. Assum<strong>in</strong>g that these bits are organized <strong>in</strong> a s<strong>in</strong>gle SRAM<br />

column, clear<strong>in</strong>g them requires extend<strong>in</strong>g the per column reset signal over the<br />

whole column plus a pull-down transistor per bit.<br />

2.2 Recovery Example<br />

Figure 4 illustrates an example of recovery us<strong>in</strong>g the TROB. Follow<strong>in</strong>g the decode<br />

of branch A, <strong>in</strong>structions B and C perform first updates of RAT entries R2<br />

and R3 and allocate TROB entries 0 and 1. The ARB is reset at <strong>in</strong>struction F<br />

and new TROB entries are allocated for <strong>in</strong>structions G and H, whichperform

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!