21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Us<strong>in</strong>g Dynamic B<strong>in</strong>ary Instrumentation to<br />

Generate Multi-platform SimPo<strong>in</strong>ts:<br />

Methodology and Accuracy<br />

V<strong>in</strong>cent M. Weaver and Sally A. McKee<br />

School of Electrical and <strong>Computer</strong> Eng<strong>in</strong>eer<strong>in</strong>g<br />

Cornell University<br />

{v<strong>in</strong>ce,sam}@csl.cornell.edu<br />

Abstract. Modern benchmark suites (e.g., SPEC CPU2006) take months<br />

to simulate. Researchers and practitioners thus use partial simulation techniques<br />

for efficiency, and hope to avoid sacrific<strong>in</strong>g accuracy. SimPo<strong>in</strong>t is<br />

a popular method of choos<strong>in</strong>g representative parts that approximate an<br />

application’s entire behavior. The approach breaks an application <strong>in</strong>to <strong>in</strong>tervals,<br />

generates a Basic Block Vector (BBV) to represent <strong>in</strong>structions<br />

executed <strong>in</strong> each <strong>in</strong>terval, clusters the BBVs accord<strong>in</strong>g to similarity, and<br />

chooses a representative <strong>in</strong>terval from the most important clusters. Unfortunately,<br />

tools to generate BBVs efficiently have heretofore been widely<br />

unavailable for many architectures, especially embedded ones.<br />

We develop plug<strong>in</strong>s for both the Qemu and Valgr<strong>in</strong>d dynamic b<strong>in</strong>ary<br />

<strong>in</strong>strumentation (DBI) tools, and compare results to those generated by<br />

the P<strong>in</strong>Po<strong>in</strong>ts utility. All three methods can deliver under 6% average<br />

CPI error on both the SPEC CPU2000 and CPU2006 benchmarks while<br />

runn<strong>in</strong>g under 0.4% of the total applications. Our tools <strong>in</strong>crease the<br />

number of architectures for which BBVs can be generated efficiently<br />

and easily; they enable simulation po<strong>in</strong>ts that <strong>in</strong>clude operat<strong>in</strong>g system<br />

activity; and they allow cross-platform collection of BBV <strong>in</strong>formation<br />

(e.g., generat<strong>in</strong>g MIPS SimPo<strong>in</strong>ts on IA32). We validate our tools via<br />

hardware performance counters on n<strong>in</strong>e 32-bit Intel L<strong>in</strong>ux platforms.<br />

1 Introduction<br />

Cycle-accurate simulators are slow. Us<strong>in</strong>g one to run a modern benchmark suite<br />

such as SPEC CPU2006 [16] can take months to complete when full reference<br />

<strong>in</strong>puts are used. This prohibitive slowdown prevents most modelers from us<strong>in</strong>g<br />

the full reference <strong>in</strong>puts. Yi et al. [18] <strong>in</strong>vestigate the six most common ways of<br />

speed<strong>in</strong>g up simulations:<br />

– Representative sampl<strong>in</strong>g (SimPo<strong>in</strong>t [13]),<br />

– Statistics based sampl<strong>in</strong>g (SMARTS [17]),<br />

– Reduced <strong>in</strong>put sets (M<strong>in</strong>neSPEC [6]),<br />

– Simulat<strong>in</strong>g the first X Million <strong>in</strong>structions,<br />

– Fast-forward<strong>in</strong>g Y Million <strong>in</strong>structions and simulat<strong>in</strong>g X Million, and<br />

– Fast-forward<strong>in</strong>g Y Million, perform<strong>in</strong>g architectural warmup, then simulat<strong>in</strong>g<br />

X Million.<br />

P. Stenström et al. (Eds.): HiPEAC 2008, LNCS <strong>4917</strong>, pp. 305–319, 2008.<br />

c○ Spr<strong>in</strong>ger-Verlag Berl<strong>in</strong> Heidelberg 2008

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!