21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Study<strong>in</strong>g Compiler Optimizations on Superscalar Processors 125<br />

the reorder buffer to fill up. In other words, the reorder buffer is unable to hide the<br />

<strong>in</strong>struction latencies and dependencies through out-of-order execution, which results <strong>in</strong><br />

<strong>in</strong>creased base, L1 D-cache and resource stall cycle components.<br />

Instruction Schedul<strong>in</strong>g. Instruction schedul<strong>in</strong>g tends to <strong>in</strong>crease the dynamic <strong>in</strong>struction<br />

count which, <strong>in</strong> its turn, <strong>in</strong>creases the base cycle component. This observation was<br />

also made by Valluri and Gov<strong>in</strong>darajan [11]. The reason for the <strong>in</strong>creased dynamic<br />

<strong>in</strong>struction count is that spill code is added dur<strong>in</strong>g the schedul<strong>in</strong>g process by the compiler.<br />

Note also that <strong>in</strong>struction schedul<strong>in</strong>g reduces the branch misprediction penalty<br />

for 10 out of 15 benchmarks, see Table 3, i.e., the critical path lead<strong>in</strong>g to the mispredicted<br />

branch is shortened through the improved <strong>in</strong>struction schedul<strong>in</strong>g. Unfortunately,<br />

this does not compensate for the <strong>in</strong>creased dynamic <strong>in</strong>struction count result<strong>in</strong>g <strong>in</strong> a net<br />

performance decrease for most of the benchmarks.<br />

Strict Alias<strong>in</strong>g. The assumption that references to different object types never access<br />

the same address allows for more aggressive schedul<strong>in</strong>g of memory operations — this<br />

is a safe optimization as long as the C program complies with the ISO C99 standard 1 .<br />

This results <strong>in</strong> significant performance improvements for a number of benchmarks, see<br />

for example art (16.2%). Strict alias<strong>in</strong>g reduces the number of non-overlapp<strong>in</strong>g L2 Dcache<br />

misses by 11.5% for art while keep<strong>in</strong>g the total number of L2 D-cache misses<br />

almost unchanged; <strong>in</strong> other words, memory-level parallelism is <strong>in</strong>creased.<br />

4.3 Comparison with In-Order Processors<br />

Hav<strong>in</strong>g discussed the impact of compiler optimizations on out-of-order processor performance,<br />

it is <strong>in</strong>terest<strong>in</strong>g to compare aga<strong>in</strong>st the impact these compiler optimizations<br />

have on <strong>in</strong>-order processor performance. Figure 6 shows the average normalized cycle<br />

counts on a superscalar <strong>in</strong>-order processor. Performance improves by 17.5% on average<br />

compared to the base optimization level. The most strik<strong>in</strong>g observation to be made when<br />

compar<strong>in</strong>g the <strong>in</strong>-order graph (Figure 6) aga<strong>in</strong>st the out-of-order graph (Figure 3) is that<br />

<strong>in</strong>struction schedul<strong>in</strong>g improves performance on the <strong>in</strong>-order processor whereas on the<br />

out-of-order processor, it degrades performance. The reason is that on <strong>in</strong>-order architectures,<br />

the improved <strong>in</strong>struction schedule outweights the additional spill code that may<br />

be generated for accommodat<strong>in</strong>g the improved <strong>in</strong>struction schedule. On an out-of-order<br />

processor, the additional spill code only adds overhead through an <strong>in</strong>creased base cycle<br />

component.<br />

To better understand the impact of compiler optimizations on out-of-order versus<br />

<strong>in</strong>-order processor performance we now compare <strong>in</strong>-order processor cycle components<br />

aga<strong>in</strong>st out-of-order processor cycle components. In order to do so, we use the follow<strong>in</strong>g<br />

cycle count<strong>in</strong>g mechanism for comput<strong>in</strong>g the cycle components on the <strong>in</strong>-order<br />

processor. For a cycle when no <strong>in</strong>struction can be issued <strong>in</strong> a particular cycle, the mechanism<br />

<strong>in</strong>crements the count of the appropriate cycle component. For example, when the<br />

next <strong>in</strong>struction to issue stalls for a register to be produced by an L2 miss, the cycle<br />

is assigned to the L2 D-cache miss cycle component. Similarly, if no <strong>in</strong>structions are<br />

1<br />

The current standard for Programm<strong>in</strong>g Language C is ISO/IEC 9899:1999, published 1999-<br />

12-01.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!