01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Complexity-Effective Rename Table Design for Rapid Speculation Recovery 21<br />

Branch1<br />

Branch2<br />

Branch3<br />

R1<br />

15<br />

R2<br />

2<br />

Fig. 4. The Checkpo<strong>in</strong>t Table used for branch <strong>in</strong>structions<br />

same cycle with the read<strong>in</strong>g <strong>of</strong> <strong>in</strong>formation from the checkpo<strong>in</strong>t table, for a pipel<strong>in</strong>ed<br />

implementation, the processor may read the tail po<strong>in</strong>ters <strong>in</strong> the first cycle and restore<br />

the rename table <strong>in</strong> the second cycle.<br />

6 Hardware Implementation<br />

R3<br />

3<br />

R4<br />

9<br />

The rename tables, like any regular memory structure used <strong>in</strong> contemporary microprocessors,<br />

are implemented by employ<strong>in</strong>g SRAM bitcells. While <strong>in</strong> the architectures<br />

that make use <strong>of</strong> wait<strong>in</strong>g or walk<strong>in</strong>g forward/backward schemes pla<strong>in</strong> SRAM bitcells<br />

are enough for implementation, each bitcell is needed to have a shift register capability<br />

<strong>in</strong> order to use checkpo<strong>in</strong>ts <strong>in</strong>side the rename table. If there are 16 checkpo<strong>in</strong>ts<br />

allowed <strong>in</strong>side the processor each bit <strong>of</strong> the rename table needs to be a 16-bit shift<br />

register.<br />

Our proposed rename table scheme removes the shift<strong>in</strong>g complexity at bitcell level<br />

and implements all <strong>of</strong> the FIFO queues with pla<strong>in</strong> SRAM bitcells. Each FIFO queue is<br />

<strong>in</strong> fact a small payload RAM that has the same bit-width as a rename table but has a<br />

larger number <strong>of</strong> entries.<br />

In the basel<strong>in</strong>e 4-way architecture the rename table must have 4 write ports (<strong>in</strong> case<br />

4 <strong>in</strong>structions rename different architectural registers) and 8 read ports (<strong>in</strong> case each<br />

<strong>in</strong>struction has a different set <strong>of</strong> architectural source registers). This structure is accessed<br />

after the dependency check<strong>in</strong>g logic detects any possible multiple renames <strong>in</strong> a<br />

s<strong>in</strong>gle cycle. Our solution does not alter the dependency check<strong>in</strong>g logic but <strong>of</strong>fers a<br />

different storage mechanism for the register mapp<strong>in</strong>gs.<br />

In an architecture with N architectural registers, our proposed scheme mandates the<br />

use <strong>of</strong> N FIFO tables that are made <strong>of</strong> SRAM bitcells. Each table is a regular payload<br />

RAM with regular decoder circuits that allow random access. However the <strong>in</strong>puts to<br />

these decoders are wired to the tail po<strong>in</strong>ter register <strong>of</strong> the correspond<strong>in</strong>g FIFO <strong>in</strong>stead<br />

<strong>of</strong> tak<strong>in</strong>g the values from <strong>in</strong>com<strong>in</strong>g <strong>in</strong>structions. The <strong>in</strong>structions that are renamed<br />

only write their tags though the regular bitl<strong>in</strong>es to the entry (or entries) automatically<br />

selected by the tail po<strong>in</strong>ter register. Note that although, <strong>in</strong> a 4-way mach<strong>in</strong>e, up to 4<br />

values can be written to a FIFO <strong>in</strong> the same cycle, only one value is read. Therefore<br />

each FIFO structure needs 4 write ports and 1 read port. Also, it should be noted that<br />

although the structure can be accessed randomly like a regular register file, the access<br />

is not random; at each cycle the value that is po<strong>in</strong>ted by the tail po<strong>in</strong>ter is read and if<br />

the correspond<strong>in</strong>g architectural register is renamed up to 4 values are written to the<br />

follow<strong>in</strong>g entries. The tail po<strong>in</strong>ter is updated after the new register assignments are<br />

written. All <strong>of</strong> the FIFO queues are circular buffers. Therefore the tail po<strong>in</strong>ter po<strong>in</strong>ts<br />

to the top <strong>of</strong> the structure after reach<strong>in</strong>g the bottom.<br />

R5<br />

1<br />

R6<br />

21<br />

R7<br />

11<br />

R8<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!