01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

MLP-Aware Instruction Queue Resiz<strong>in</strong>g: The Key to Power-Efficient Performance 115<br />

The proposals by Buyuktosunoglu et al [1] and Kucuk et al [3], resize the IQ<br />

by physically partition<strong>in</strong>g it <strong>in</strong> segments and disabl<strong>in</strong>g and enabl<strong>in</strong>g each segment 1<br />

as needed. Both techniques use a feedback loop to control the size <strong>of</strong> the IQ. The<br />

Buyuktosunoglu et al. technique [1] is based on the number <strong>of</strong> active entries (i.e.,<br />

ready-to-issue entries) per segment to decide whether or not a segment deserves to<br />

rema<strong>in</strong> active, while Kucuk et al. [3] argue that the occupancy <strong>of</strong> the IQ (<strong>in</strong> valid<br />

<strong>in</strong>structions) rather than active entries is a better <strong>in</strong>dication <strong>of</strong> its required size. In both<br />

techniques the IPC is measured periodically; a sudden drop <strong>in</strong> IPC signals the need<br />

for IQ upsiz<strong>in</strong>g. However <strong>in</strong> both approaches there is no other mechanism to revert<br />

back to the full IQ size.<br />

The Folegnani and González approach [2] is dist<strong>in</strong>guished by be<strong>in</strong>g a logical resiz<strong>in</strong>g<br />

<strong>of</strong> the IQ (limit<strong>in</strong>g the range <strong>of</strong> the entries that can be allocated) not a physical<br />

partition<strong>in</strong>g as the other two. In addition, the feedback mechanism is based on how<br />

much the youngest <strong>in</strong>structions <strong>in</strong> the IQ contribute to the actual ILP. If they do not<br />

contribute much, this means that the size <strong>of</strong> the IQ can be reduced further. However,<br />

there is no way to adapt the IQ to larger sizes except periodically rever<strong>in</strong>g back to full<br />

size. Nevertheless, the Folegnani and González resiz<strong>in</strong>g policy is very good at adjust<strong>in</strong>g<br />

the IQ size so as to not harm the ILP <strong>in</strong> the program.<br />

Model<strong>in</strong>g <strong>of</strong> MLP–– Karkhanis and Smith presented a first-order model <strong>of</strong> a superscalar<br />

core [6] that deepened our understand<strong>in</strong>g <strong>of</strong> MLP. This work exposed the importance<br />

<strong>of</strong> MLP <strong>in</strong> shap<strong>in</strong>g the performance <strong>of</strong> out-<strong>of</strong>-order execution. Karkhanis<br />

and Smith show that <strong>in</strong> absence <strong>of</strong> any upsett<strong>in</strong>g events such as branch mispredictions<br />

and L2 misses the number <strong>of</strong> <strong>in</strong>structions that will issue (on average) is a property <strong>of</strong><br />

the program and is expressed by the so-called IW characteristic <strong>of</strong> the program.<br />

The presence, however, <strong>of</strong> upsett<strong>in</strong>g events, such as L2 misses, decreases the IPC<br />

from the ideal po<strong>in</strong>t <strong>of</strong> the IW characteristic depend<strong>in</strong>g on the apparent cost <strong>of</strong> the L2<br />

miss. This is because, a L2 miss dra<strong>in</strong>s the pipel<strong>in</strong>e, and eventually stalls it when the<br />

<strong>in</strong>struction that misses blocks the retirement <strong>of</strong> other <strong>in</strong>structions [6]. MLP, <strong>in</strong> this<br />

case, spreads the cost <strong>of</strong> access<strong>in</strong>g ma<strong>in</strong> memory over all the <strong>in</strong>structions that miss <strong>in</strong><br />

parallel, mak<strong>in</strong>g their apparent cost appear much less.<br />

Exist<strong>in</strong>g IQ resiz<strong>in</strong>g mechanisms focus ma<strong>in</strong>ly on the variations <strong>of</strong> the ILP (effectively<br />

mov<strong>in</strong>g along the IW characteristic) ignor<strong>in</strong>g what happens dur<strong>in</strong>g misses. Our<br />

work, focuses <strong>in</strong>stead on the latter.<br />

MLP-Aware techniques–– Qureshi et al. exploited MLP to optimize cache replacement<br />

[8]. MLP <strong>in</strong> their case is associated with data (cache l<strong>in</strong>es) and the MLP prediction<br />

is done <strong>in</strong> the cache. Data that do not exhibit MLP are given preference <strong>in</strong> stay<strong>in</strong>g<br />

<strong>in</strong> the cache over data that exhibit MLP. Eyerman and Eeckhout exploit MLP to optimize<br />

fetch policies for Simultaneous Multi-Threaded (SMT) processors [4]. In their<br />

case, MLP is associated with <strong>in</strong>structions that miss <strong>in</strong> the cache. Our MLP prediction<br />

mechanism is similar to the one proposed by Eyerman and Eeckhout, but because we<br />

use it not to regulate the fetch stage, but to manage the entire IQ, we associate MLP<br />

<strong>in</strong>formation to larger segments <strong>of</strong> code.<br />

1 Sequentially from the end <strong>of</strong> the IQ <strong>in</strong> [1] or <strong>in</strong>dependently <strong>of</strong> position <strong>in</strong> [3].

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!