29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

396 Chapter 29<br />

to their access patterns. By choosing these benchmarks, we had a set that<br />

contains applications with regular access patterns (Swim, Mgrid, Vpenta, and<br />

Adi), a set with irregular access patterns (Perl, Li, Compress, and Applu),<br />

and a set with mixed (regular + irregular) access patterns (remaining applications).<br />

Note that there is a high correlation between the nature of the<br />

benchmark (floating-point versus integer) and its access pattern. In most cases,<br />

numeric co<strong>de</strong>s have regular accesses and integer benchmarks have irregular<br />

accesses. Table 29-2 summarizes the salient characteristics of the benchmarks<br />

used in this study, including the inputs used, the total number of instructions<br />

executed, and the miss rates (%). The numbers in this table were obtained by<br />

simulating the base configuration in Table 29-1. For all the programs, the<br />

simulations were run to completion. It should also be mentioned, that in all<br />

of these co<strong>de</strong>s, the conflict misses constitute a large percentage of total cache<br />

misses (approximately between 53% and 72% even after aggressive array<br />

padding).<br />

4.3. Simulated versions<br />

For each benchmark, we experimented with four different versions – (1) Pure<br />

Hardware – the version that uses only hardware to optimize locality (Section<br />

3.1); (2) Pure <strong>Software</strong> – the version that uses only compiler optimizations<br />

<strong>for</strong> locality (Section 3.2); (3) Combined– the version that uses both hardware<br />

and compiler techniques <strong>for</strong> the entire duration of the program; and (4)<br />

Selective (Hardware/Compiler) – the version that uses hardware and software<br />

approaches selectively in an integrated manner as explained in this work<br />

(our approach).<br />

Table 29-2. Benchmark characteristics. ‘M’ <strong>de</strong>notes millions.<br />

Benchmark<br />

Input<br />

Number of<br />

Instructions<br />

executed<br />

L1 Miss<br />

Rate [%]<br />

L2 Miss<br />

Rate [%]<br />

Perl<br />

Compress<br />

Li<br />

Swim<br />

Applu<br />

Mgrid<br />

Chaos<br />

Vpenta<br />

Adi<br />

TPC-C<br />

TPC-D, Q1<br />

TPC-D, Q2<br />

TPC-D, Q3<br />

primes.in<br />

training<br />

train.lsp<br />

train<br />

train<br />

mgrid.in<br />

mesh.2k<br />

Large enough to fill L2<br />

Large enough to fill L2<br />

Generated using TPC tools<br />

Generated using TPC tools<br />

Generated using TPC tools<br />

Generated using TPC tools<br />

11.2 M<br />

58.2 M<br />

186.8M<br />

877.5 M<br />

526.0 M<br />

78.7 M<br />

248.4 M<br />

126.7 M<br />

126.9 M<br />

16.5 M<br />

38.9 M<br />

67.7 M<br />

32.4 M<br />

2.82<br />

3.64<br />

1.95<br />

3.91<br />

5.05<br />

4.51<br />

7.33<br />

52.17<br />

25.02<br />

6.15<br />

9.85<br />

13.62<br />

4.20<br />

1.6<br />

10.1<br />

3.7<br />

14.4<br />

13.2<br />

3.3<br />

1.8<br />

39.8<br />

53.5<br />

12.6<br />

4.7<br />

5.4<br />

11.0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!