21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Study<strong>in</strong>g Compiler Optimizations on Superscalar Processors 127<br />

hand, may not be able to get to the <strong>in</strong>dependent long-latency loads because of the processor<br />

stall<strong>in</strong>g on <strong>in</strong>structions that are dependent on the first long-latency load.<br />

5 Related Work<br />

A small number of research papers exist on compiler optimizations for out-of-order processors,<br />

however, none of this prior work analyzes the impact of compiler optimizations<br />

<strong>in</strong> terms of their impact on the various cycle components.<br />

Valluri and Gov<strong>in</strong>darajan [11] evaluate the effectiveness of postpass and prepass <strong>in</strong>struction<br />

schedul<strong>in</strong>g techniques on out-of-order processor performance. In postpass<br />

schedul<strong>in</strong>g, register allocation precedes <strong>in</strong>struction schedul<strong>in</strong>g. The potential drawback<br />

is that false dependencies <strong>in</strong>troduced by the register allocator may limit the scheduler’s<br />

ability to efficiently schedule <strong>in</strong>structions. A prepass schedul<strong>in</strong>g on the other hand only<br />

allocates registers after complet<strong>in</strong>g <strong>in</strong>struction schedul<strong>in</strong>g. The potential drawback is that<br />

register lifetimes may <strong>in</strong>crease which possibly leads to more spill code. Silvera et al. [12]<br />

also emphasize the importance of reduc<strong>in</strong>g register spill code <strong>in</strong> out-of-order issue processors.<br />

This is also what we observe <strong>in</strong> this paper. Instruction schedul<strong>in</strong>g <strong>in</strong>creases the<br />

dynamic <strong>in</strong>struction count which degrades the base cycle component and, for most benchmarks,alsodegradesoverallperformance.Thispaperisdifferentfromthestudyconducted<br />

by Valluri and Gov<strong>in</strong>darajan [11] <strong>in</strong> two ma<strong>in</strong> ways. First, Valluri and Gov<strong>in</strong>darajan limit<br />

their study to <strong>in</strong>struction schedul<strong>in</strong>g; our paper studies a wide range of compiler optimizations.<br />

Second, the study done by Valluri and Gov<strong>in</strong>darajan is an empirical study and does<br />

not provide the <strong>in</strong>sight that we provide us<strong>in</strong>g an analytical processor model.<br />

Pai and Adve [13] propose read miss cluster<strong>in</strong>g, a code transformation technique<br />

suitable for compiler implementation that improves memory-level parallelism on outof-order<br />

processors. Read miss cluster<strong>in</strong>g strives at schedul<strong>in</strong>g likely long-latency <strong>in</strong>dependent<br />

memory accesses as close to each other as possible. At execution time, these<br />

long-latency loads will then overlap improv<strong>in</strong>g overall performance.<br />

Holler [14] discusses various compiler optimizations for the out-of-order HP PA-<br />

8000 processor. The paper enumerates various heuristics for driv<strong>in</strong>g various compiler<br />

optimizations such as loop unroll<strong>in</strong>g, if-conversion, superblock formation, <strong>in</strong>struction<br />

schedul<strong>in</strong>g, etc. However, Holler does not quantify the impact of each of these compiler<br />

optimizations on out-of-order processor performance.<br />

Cohn and Lowney [15] study feedback-directed compiler optimizations on the outof-order<br />

Alpha 21264 processor. Aga<strong>in</strong>, Cohn and Lowney do not provide <strong>in</strong>sight <strong>in</strong>to<br />

how compiler optimizations affect cycle components.<br />

Vaswani et al. [16] build empirical models that predict the effect of compiler optimizations<br />

and microarchitecture configurations on superscalar processor performance.<br />

Those models do not provide the <strong>in</strong>sights <strong>in</strong> terms of cycle components obta<strong>in</strong>ed from<br />

<strong>in</strong>terval analysis as presented <strong>in</strong> this paper.<br />

6 Conclusion and Impact on Future Work<br />

The <strong>in</strong>teraction between compiler optimizations and superscalar processors is difficult<br />

to understand, especially because of overlap effects <strong>in</strong> superscalar out-of-order

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!