21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

128 S. Eyerman, L. Eeckhout, and J.E. Smith<br />

processors. This paper analyzed the impact compiler optimizations have on out-of-order<br />

processor performance us<strong>in</strong>g <strong>in</strong>terval analysis by divid<strong>in</strong>g total execution time <strong>in</strong>to cycle<br />

components.<br />

This paper provides a number of key <strong>in</strong>sights that can help drive future work <strong>in</strong><br />

compiler optimizations for out-of-order processors. First, the critical path lead<strong>in</strong>g to<br />

mispredicted branches is the only place dur<strong>in</strong>g program execution where the impact<br />

of the critical path of <strong>in</strong>ter-operation dependencies is visible on overall performance.<br />

As such, limit<strong>in</strong>g the focus of <strong>in</strong>struction schedul<strong>in</strong>g to paths lead<strong>in</strong>g to mispredicted<br />

branches could yield improved performance and/or limit compilation time; the latter is<br />

an important consideration for dynamic compilation systems. Second, the analysis <strong>in</strong><br />

this paper showed that reduc<strong>in</strong>g the dynamic <strong>in</strong>struction count improves performance<br />

by reduc<strong>in</strong>g the base cycle component. As such, compiler builders can use this <strong>in</strong>sight<br />

for gear<strong>in</strong>g towards optimizations for out-of-order processors that m<strong>in</strong>imize the<br />

dynamic <strong>in</strong>struction count, rather than to <strong>in</strong>crease the amount of ILP — ILP can be extracted<br />

dynamically by the hardware. The results presented <strong>in</strong> this paper shows that<br />

reduc<strong>in</strong>g the dynamic <strong>in</strong>struction count rema<strong>in</strong>s an important optimization criterion<br />

for today’s high-performance microprocessors. Third, s<strong>in</strong>ce miss events have a large<br />

impact on overall performance, more so on out-of-order processors than on <strong>in</strong>-order<br />

processors, it is important to make compiler optimizations conscious of their potential<br />

impact on miss events. In particular, across the optimization sett<strong>in</strong>gs considered <strong>in</strong> this<br />

paper, 47.3% of the total performance improvement comes from reduced miss event<br />

cycle components for the an out-of-order processor versus only 17.3% for the <strong>in</strong>-order<br />

processor. Fourth, compiler optimizations can improve the amount of memory-level<br />

parallelism by schedul<strong>in</strong>g long-latency back-end loads closer to each other <strong>in</strong> the b<strong>in</strong>ary.<br />

Independent long-latency loads that occur with<strong>in</strong> ROB size <strong>in</strong>structions from each<br />

other <strong>in</strong> the dynamic <strong>in</strong>struction stream overlap at run time which results <strong>in</strong> memorylevel<br />

parallelism and thus improved performance. In fact, most of the L2 D-cache miss<br />

cycle component reduction observed <strong>in</strong> our experiments comes from improved MLP,<br />

not from reduc<strong>in</strong>g the number of L2 D-cache misses. We believe more research can be<br />

conducted <strong>in</strong> explor<strong>in</strong>g compiler optimizations that expose memory-level parallelism.<br />

Acknowledgements<br />

The authors would like to thank the reviewers for their <strong>in</strong>sightful comments. Stijn Eyerman<br />

and Lieven Eeckhout are supported by the Fund for Scientific Research <strong>in</strong> Flanders<br />

(Belgium (FWO-Vlaanderen). Additional support was provided by the European<br />

HiPEAC Network of Excellence.<br />

References<br />

1. Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J.E.: A performance counter architecture<br />

for comput<strong>in</strong>g accurate CPI components. In: ASPLOS, pp. 175–184 (2006)<br />

2. Eyerman, S., Smith, J.E., Eeckhout, L.: Characteriz<strong>in</strong>g the branch misprediction penalty. In:<br />

ISPASS, pp. 48–58 (2006)<br />

3. Karkhanis, T.S., Smith, J.E.: A first-order superscalar processor model. In: ISCA, pp. 338–<br />

349 (2004)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!