21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

@00: add r4,r0,400<br />

@04: add r5,r0,0<br />

@08: add r1,0,4<br />

@12: load r2,0(r4)<br />

@16: load r3,0(r4)<br />

@20: add r2,r3,r2<br />

@24: add r5,r5,r2<br />

@28: sub r4,r4,r1<br />

@32: bneqz r4,@12<br />

LPA: A First Approach to the Loop Processor Architecture 277<br />

Loop Example LVM Updates<br />

LVM (after 1st iteration)<br />

virtual v1.0 assigned to logical r4<br />

virtual v2.0 assigned to logical r5<br />

virtual v3.0 assigned to logical r1<br />

virtual v4.0 assigned to logical r2<br />

virtual v5.0 assigned to logical r3<br />

virtual v6.0 assigned to logical r2 (v4.0 out)<br />

virtual v7.0 assigned to logical r5 (v2.0 out)<br />

virtual v8.0 assigned to logical r4 (v1.0 out)<br />

Loop detected!!<br />

Loop branch is @32<br />

First <strong>in</strong>struction is @12<br />

Captur<strong>in</strong>g Loop state starts<br />

Log rVT iVT I<br />

r1 v3 0 0<br />

r2 v6 0 0<br />

r3 v5 0 0<br />

r4 v8 0 0<br />

r5 v7 0 0<br />

Fig. 3. Loop detection after the execution of its first iteration<br />

The transparency of this process is an important advantage of LPA: there is no<br />

need for functional changes <strong>in</strong> the processor design beyond <strong>in</strong>troduc<strong>in</strong>g the loop<br />

w<strong>in</strong>dow and the two-component virtual tag renam<strong>in</strong>g scheme. The out-of-order<br />

superscalar execution core will behave <strong>in</strong> the same way regardless it receives<br />

<strong>in</strong>structions from the normal pipel<strong>in</strong>e or from the loop w<strong>in</strong>dow.<br />

2.2 Loop Detection and Storage<br />

When a backward branch is predicted taken, LPA enters <strong>in</strong> the Captur<strong>in</strong>g Loop<br />

state. Figure 3 shows an example of a loop structure that is detected by LPA<br />

at the end of its first iteration, that is, when the branch <strong>in</strong>struction f<strong>in</strong>aliz<strong>in</strong>g<br />

the loop body is predicted taken. The backward branch is considered the loop<br />

branch and its target address is considered the first <strong>in</strong>struction of the loop body.<br />

Therefore, the loop body starts at <strong>in</strong>struction @12 and f<strong>in</strong>alizes at the loop<br />

branch @32.<br />

Dur<strong>in</strong>g the Captur<strong>in</strong>g Loop state, the <strong>in</strong>structions belong<strong>in</strong>g to the loop body<br />

are stored <strong>in</strong> the loop w<strong>in</strong>dow. Data dependences between these <strong>in</strong>structions are<br />

resolved us<strong>in</strong>g the renam<strong>in</strong>g mechanism. Figure 3 shows a snapshot of the LVM<br />

contents after the first loop iteration. We assume that there are just five logical<br />

registers <strong>in</strong> order to simplify the graph. Instructions are renamed <strong>in</strong> program<br />

order. Each <strong>in</strong>struction receives a virtual tag that is stored <strong>in</strong> the correspond<strong>in</strong>g<br />

rVT field, while the iVT field is <strong>in</strong>itialized to zero.<br />

In addition, each LVM entry conta<strong>in</strong>s a bit (I) that <strong>in</strong>dicates whether a logical<br />

register is <strong>in</strong>side a loop body and is iteration dependent. The I bit is always<br />

<strong>in</strong>itialized to zero. The value of the I bit is only set to one for those logical<br />

registers that receive a new virtual register dur<strong>in</strong>g the Captur<strong>in</strong>g Loop state. An I<br />

bit set to one <strong>in</strong>dicates that the associated logical register is iteration-dependent,<br />

that is, it is def<strong>in</strong>ed <strong>in</strong>side the loop body and thus its value is produced by an<br />

<strong>in</strong>struction from the current or the previous iteration.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!