01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

20 G. Aşılıoğlu, E.M. Kaya, and O. Erg<strong>in</strong><br />

5 Recover<strong>in</strong>g from Branch Mispredictions<br />

The proposed structure allows rapid recovery <strong>of</strong> the rename table as all <strong>of</strong> the speculative<br />

register assignments are available <strong>in</strong> direct mapped SRAM bitcell array. In the<br />

case <strong>of</strong> a branch misprediction, fix<strong>in</strong>g the tail po<strong>in</strong>ters is enough to recover the rename<br />

table. Different schemes can be employed by us<strong>in</strong>g the proposed rename table<br />

structure to rollback the speculation on a branch <strong>in</strong>struction. For a processor that waits<br />

until the fault<strong>in</strong>g branch reaches the top <strong>of</strong> the rename table, there is no need to keep a<br />

commit rename table s<strong>in</strong>ce when the branch reaches the top <strong>of</strong> the ROB, the head<br />

po<strong>in</strong>ters for each architectural register will po<strong>in</strong>t to the precise state. In fact, the head<br />

po<strong>in</strong>ters <strong>in</strong> the proposed structure form the commit rename table when they are used<br />

together which alleviates the need to store a separate commit rename table as implemented<br />

<strong>in</strong> some modern microprocessors.<br />

Walk backward and walk forward schemes get easier to implement with the<br />

proposed design s<strong>in</strong>ce noth<strong>in</strong>g <strong>in</strong> the rename table is overwritten dur<strong>in</strong>g the rename<br />

process. Dur<strong>in</strong>g the walk backward operation, the processor just decrements the correspond<strong>in</strong>g<br />

tail po<strong>in</strong>ter <strong>of</strong> the architectural register that is targeted by the squashed<br />

<strong>in</strong>struction. When the fault<strong>in</strong>g branch <strong>in</strong>struction is reached, the tail po<strong>in</strong>ters <strong>of</strong> all<br />

FIFO queues are restored.<br />

Checkpo<strong>in</strong>t<strong>in</strong>g becomes simpler at the circuit level by us<strong>in</strong>g the proposed rename<br />

table design. Regular checkpo<strong>in</strong>t<strong>in</strong>g requires a shadow copy taken at each branch<br />

<strong>in</strong>struction. This is accomplished by implement<strong>in</strong>g each bit <strong>of</strong> the rename table as a<br />

shift register. S<strong>in</strong>ce the number <strong>of</strong> checkpo<strong>in</strong>ts limits the number <strong>of</strong> branch <strong>in</strong>structions<br />

that can reside <strong>in</strong>side the processor concurrently, hav<strong>in</strong>g more checkpo<strong>in</strong>ts is<br />

desirable. However, as the number <strong>of</strong> checkpo<strong>in</strong>ts <strong>in</strong>creases, the logic depth for an<br />

<strong>in</strong>dividual shift register <strong>in</strong>creases, which results <strong>in</strong> a higher rename table latency and<br />

limits the frequency <strong>of</strong> the processor. The use <strong>of</strong> the proposed scheme allows implement<strong>in</strong>g<br />

the checkpo<strong>in</strong>t<strong>in</strong>g scheme more easily without any need for shift registers.<br />

Instead <strong>of</strong> checkpo<strong>in</strong>t<strong>in</strong>g the entire table on each branch, the tail po<strong>in</strong>ters for each<br />

architectural register are stored <strong>in</strong> a table whenever a branch <strong>in</strong>struction arrives at the<br />

rename stage. This table is <strong>in</strong>dexed by the branch identifiers and has to be implemented<br />

as a circular FIFO queue s<strong>in</strong>ce nested branch <strong>in</strong>structions may require squash<strong>in</strong>g<br />

multiple branch <strong>in</strong>structions <strong>in</strong> program order.<br />

Fig. 4 shows the structure <strong>of</strong> the checkpo<strong>in</strong>t table used to store the <strong>in</strong>formation for<br />

each branch <strong>in</strong>struction. For each entry, a tail po<strong>in</strong>ter is stored for each architectural<br />

register. S<strong>in</strong>ce the size <strong>of</strong> each FIFO queue can be at most equal to the number <strong>of</strong><br />

physical registers, each stored tail po<strong>in</strong>ter can be represented with log2(number <strong>of</strong><br />

physical registers). For a processor with 256 registers, tail po<strong>in</strong>ters are 8-bits long,<br />

result<strong>in</strong>g <strong>in</strong> 64 bits for each entry <strong>in</strong> the checkpo<strong>in</strong>t table. Therefore the latency <strong>of</strong> this<br />

structure is similar to the register file itself depend<strong>in</strong>g on the number <strong>of</strong> branches<br />

allowed <strong>in</strong>side the processor. Whenever the outcome <strong>of</strong> a branch is mispredicted, the<br />

processor accesses the checkpo<strong>in</strong>t table with branch <strong>in</strong>dex and reads the stored tail<br />

po<strong>in</strong>ters. The tail po<strong>in</strong>ter registers <strong>of</strong> the rename table are overwritten by the tail<br />

po<strong>in</strong>ter values read from the checkpo<strong>in</strong>t table <strong>in</strong> order to recover the rename table to<br />

the cycle just before the fault<strong>in</strong>g branch <strong>in</strong>struction. Depend<strong>in</strong>g on the circuit level<br />

implementation and the clock frequency <strong>of</strong> the processor, restor<strong>in</strong>g the rename table<br />

may take one or two cycles. While it can be possible to fix the tail po<strong>in</strong>ters <strong>in</strong> the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!