29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

366 Chapter 27<br />

3. SIMULATION METHODOLOGY AND RESULTS<br />

The goal of this article is to <strong>de</strong>monstrate the i<strong>de</strong>a and effectiveness of<br />

aggregating flows to improve IR and not the architecture <strong>for</strong> enabling the<br />

same. Hence, we do not simulate the architecture proposed in the previous<br />

section (this will be done as future work) but use a single threa<strong>de</strong>d single<br />

processor mo<strong>de</strong>l to evaluate reuse. This gives us an i<strong>de</strong>a of the effectiveness<br />

of flow aggregation in improving IR. We modified the SimpleScalar [8]<br />

simulator (MIPS ISA) and used the <strong>de</strong>fault configuration [8] to evaluate<br />

instruction reuse on a subset of programs representative of different classes<br />

of applications from two popular NPU benchmarks – CommBench [9] and<br />

NetBench [10] (see Table 27-1). It must be noted that we use SimpleScalar<br />

since it is representative of an out-of-or<strong>de</strong>r issue pipelined processor with<br />

dynamic scheduling and support <strong>for</strong> speculative execution. In other words, we<br />

assume that the NPU is based on a superscalar RISC architecture which is<br />

representative of many NPUs available in the market. Further, using<br />

Simplescalar makes it easy to compare results with certain other results in<br />

[9] and [10] that also use the same simulation environment. When the PC of<br />

an instruction matches the PC of the function that reads a new packet, the<br />

output port <strong>for</strong> the packet is read from a precomputed table of output ports.<br />

This i<strong>de</strong>ntifies the RB to be used <strong>for</strong> the current set of instructions being<br />

processed. The RB i<strong>de</strong>ntifier is stored in the RoB along with the operands<br />

<strong>for</strong> the instruction and the appropriate RB is queried to <strong>de</strong>termine if the<br />

instruction can be reused. The results there<strong>for</strong>e obtained are valid in<strong>de</strong>pen<strong>de</strong>nt<br />

of the architecture proposed in the previous section. Since threads and<br />

multiple processors are not consi<strong>de</strong>red during simulation, the results reported<br />

in this article give an upper bound on the speedup that can be achieved by<br />

exploiting flow aggregation. However, the bound can be improved further if<br />

realistic network traffic traces that are not anonymized in their hea<strong>de</strong>r and<br />

Table 27-1. Reuse and speedup without flow aggregation <strong>for</strong> different RB configurations.<br />

R = % instructions reused, S = % improvement in speedup due to reuse. % Speedup due to<br />

operand in<strong>de</strong>xing and % reduction in memory traffic <strong>for</strong> a (32,8) RB is shown in the last two<br />

columns.<br />

Benchmark<br />

32,4<br />

R<br />

32,4<br />

S<br />

128,4<br />

R<br />

128,4<br />

S<br />

1024,4<br />

R<br />

1024,4<br />

S<br />

Opin<strong>de</strong>x<br />

Mem<br />

Traffic<br />

FRAG<br />

DRR<br />

RTR<br />

REEDENC<br />

REED DEC<br />

CRC<br />

MD5<br />

URL<br />

7.9<br />

12.6<br />

15.2<br />

19.8<br />

6.6<br />

19.1<br />

1.4<br />

18.8<br />

3.7<br />

0.16<br />

3.8<br />

2<br />

1.76<br />

19.6<br />

1.3<br />

9.4<br />

20.4<br />

15.5<br />

33.2<br />

20.3<br />

11.8<br />

20.7<br />

3.5<br />

19.9<br />

4.9<br />

0.5<br />

6.1<br />

2.05<br />

4<br />

19.84<br />

2.3<br />

11.2<br />

24.4<br />

18.2<br />

47.6<br />

25.2<br />

16.6<br />

21.8<br />

14.2<br />

22.2<br />

8.3<br />

0.86<br />

8.1<br />

2.95<br />

5.6<br />

19.84<br />

8.3<br />

12.7<br />

5.3<br />

0.4<br />

9.2<br />

4.7<br />

6<br />

20.4<br />

1.6<br />

13.1<br />

42.1<br />

11.6<br />

71.3<br />

8.7<br />

4.9<br />

35.1<br />

34.3<br />

42

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!