21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

302 R. Lev<strong>in</strong>, I. Newman, and G. Haber<br />

Benchmark<br />

Table 2. Runtime comparison<br />

Runtime<br />

after<br />

FDPR<br />

O3<br />

us<strong>in</strong>g<br />

sampled<br />

Runtime<br />

after<br />

FDPR<br />

O3<br />

us<strong>in</strong>g<br />

fixed<br />

sampled<br />

Runtime<br />

after<br />

FDPR<br />

O3<br />

us<strong>in</strong>g<br />

full<br />

profile<br />

parser 5.2% 6.6% 7.2%<br />

bzip 3.8% 5% 5%<br />

crafty 4.1% 4.5% 6%<br />

gap 8.5% 13.5% 13.2%<br />

gzip 12% 16.5% 16.5%<br />

gcc 4.35% 3.4% 4.4%<br />

mcf 8% 8.5% 8.5%<br />

twolf 10.25% 12.25% 14.1%<br />

vpr 8.2% 8.8% 9.3%<br />

Note: The percentages above refer to per<br />

formance improvement compared to the<br />

base runtime.<br />

Fig. 3. fixed vs. unfixed sampl<strong>in</strong>g<br />

The average degree of overlap, us<strong>in</strong>g our technique, calculated on SPEC<strong>in</strong>t2000 is<br />

82% compared to 62% without us<strong>in</strong>g the fix. The average performance ga<strong>in</strong> is only<br />

0.6% less than when us<strong>in</strong>g the full edge profile, while without us<strong>in</strong>g the suggested fix,<br />

the average performance ga<strong>in</strong> is 2.2% less than the full edge profile. F<strong>in</strong>ally, the<br />

average improvement <strong>in</strong> degree of overlap is 21% and we reach a 1.8% average<br />

improvement <strong>in</strong> performance when compared to not us<strong>in</strong>g our fixup algorithm.<br />

7 Future Directions<br />

Our fixup technique can be used for a wide variety of profil<strong>in</strong>g problems. Collection<br />

of <strong>in</strong>accurate or lack<strong>in</strong>g profile <strong>in</strong>formation may be due to several reasons, other than<br />

those addressed <strong>in</strong> the paper, such as the follow<strong>in</strong>g:<br />

• After apply<strong>in</strong>g several optimizations, such as function clon<strong>in</strong>g, <strong>in</strong>l<strong>in</strong><strong>in</strong>g, or after<br />

apply<strong>in</strong>g optimizations such as constant/value-range propagation which may<br />

elim<strong>in</strong>ate edges <strong>in</strong> the control flow graph, the orig<strong>in</strong>al profile <strong>in</strong>formation<br />

becomes <strong>in</strong>consistent and needs to be corrected. In most cases, re-runn<strong>in</strong>g the<br />

profil<strong>in</strong>g phase on the modified program is not desirable.<br />

• When profil<strong>in</strong>g a multithreaded or multiprocessed application some counter<br />

promotions may be miss<strong>in</strong>g as a result of multiple threads/processes <strong>in</strong>crement<strong>in</strong>g<br />

the same counter without synchronization. Add<strong>in</strong>g synchronization to each<br />

vertex's/edge's counter may be undesirable due to additional runtime overhead<br />

and additional memory to be used as a mutex for each basic block/edge.<br />

• When reus<strong>in</strong>g profile <strong>in</strong>formation from older versions of the program.<br />

Another future direction can be f<strong>in</strong><strong>in</strong>g the optimal flow-fix, our fixup vector, with<br />

respect to different cost types such as m<strong>in</strong>imiz<strong>in</strong>g L ∞ or L 2 (the least mean squares)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!