21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

314 V.M. Weaver and S.A. McKee<br />

often has trouble captur<strong>in</strong>g. The <strong>in</strong>teger benchmarks have the biggest source of<br />

error, which is the gcc benchmarks. The reason gcc behaves so poorly is that<br />

there are <strong>in</strong>tervals dur<strong>in</strong>g its execution where the CPI and other metrics spike.<br />

These huge spikes do not repeat, and only happen for one <strong>in</strong>terval; because of<br />

this, SimPo<strong>in</strong>t does not weight them as be<strong>in</strong>g important, and they therefore are<br />

omitted from the chosen simulation po<strong>in</strong>ts. These high peaks are what cause the<br />

actual average results to be much higher than what is predicted by SimPo<strong>in</strong>t.<br />

It might be possible to work around this problem by choos<strong>in</strong>g a smaller <strong>in</strong>terval<br />

size, which would break the problematic <strong>in</strong>tervals <strong>in</strong>to multiple smaller ones that<br />

would be more easily seen by SimPo<strong>in</strong>t.<br />

We also use our BBV tools on the SPEC CPU2006 benchmarks. These runs<br />

use the same tools as for CPU2000, without any modifications. These tools yield<br />

good results without requir<strong>in</strong>g any special knowledge of the newer benchmarks.<br />

We do not have results for the zeusmp benchmark: it would not run under<br />

any of the DBI tools. Unlike the CPU2000 results, we only have performance<br />

counter data from four of the mach<strong>in</strong>es. Many of the CPU2006 benchmarks<br />

have work<strong>in</strong>g sets of over 1GB, and many of our mach<strong>in</strong>es have less RAM than<br />

that. On those mach<strong>in</strong>es the benchmarks take months to run, with the operat<strong>in</strong>g<br />

system pag<strong>in</strong>g constantly to disk. The CPU2006 results shown <strong>in</strong> Figure 8 are<br />

as favorable as the CPU2000 results. When allow<strong>in</strong>g SimPo<strong>in</strong>t to choose up to<br />

10 simulation po<strong>in</strong>ts per benchmark, the average error for CPI is 5.58% for P<strong>in</strong>,<br />

5.30% for Qemu and 5.28% for Valgr<strong>in</strong>d. P<strong>in</strong> chooses 420 simulation po<strong>in</strong>ts,<br />

Qemu 433, and Valgr<strong>in</strong>d. This would require simulat<strong>in</strong>g only 0.056% of the total<br />

benchmark suite. This is an impressive speedup, consider<strong>in</strong>g the long runn<strong>in</strong>g<br />

time of these benchmarks. Figure 8 also shows the results when SimPo<strong>in</strong>t is<br />

allowed to pick up to 20 simulation po<strong>in</strong>ts, which requires simulat<strong>in</strong>g only 0.1%<br />

of the total benchmarks. Average error for CPI is 3.39% for P<strong>in</strong>, 4.04% for Qemu,<br />

and 3.68% for Valgr<strong>in</strong>d.<br />

Error when simulat<strong>in</strong>g the first 100M <strong>in</strong>structions averages 102%, show<strong>in</strong>g<br />

that this cont<strong>in</strong>ues to be a poor way to choose simulation <strong>in</strong>tervals. Fastforward<strong>in</strong>g<br />

1B <strong>in</strong>structions and then simulat<strong>in</strong>g 100M produces an average error<br />

of 31%. Us<strong>in</strong>g only a s<strong>in</strong>gle simulation po<strong>in</strong>t aga<strong>in</strong> has error over twice that<br />

of us<strong>in</strong>g up to 10 SimPo<strong>in</strong>ts. Figures 9 and 10 show CPI errors for <strong>in</strong>dividual<br />

benchmarks on the Pentium D mach<strong>in</strong>e. For float<strong>in</strong>g po<strong>in</strong>t applications, there are<br />

outly<strong>in</strong>g results for cactusADM, dealII, andGemsFDTD. For these benchmarks,<br />

total number of committed <strong>in</strong>structions measured by the DBI tools differs from<br />

that measured with the performance counters. Improv<strong>in</strong>g the BBV tools should<br />

fix these outliers.<br />

As with the CPU2000 results, the biggest source of error is from gcc <strong>in</strong> the<br />

<strong>in</strong>teger benchmarks. The reasons are the same as described previously: Sim-<br />

Po<strong>in</strong>t cannot handle the spikes <strong>in</strong> the phase behavior. The bzip2 benchmarks <strong>in</strong><br />

CPU2006 exhibit the same problem that gcc has. Inputs used <strong>in</strong> CPU2006 have<br />

spiky behavior that the CPU2000 <strong>in</strong>puts do not. The other outliers, perlbench<br />

and astar require further <strong>in</strong>vestigation.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!