29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

368 Chapter 27<br />

that the flow aggregation scheme with 2 RB’s is sufficient to uncover significant<br />

IR (though we use 4 in this article) <strong>for</strong> most benchmarks consi<strong>de</strong>red. We<br />

report results only <strong>for</strong> those cases <strong>for</strong> which the returns are consi<strong>de</strong>rable.<br />

The flow aggregation scheme is based on the output port with a simple<br />

mapping scheme (<strong>for</strong> RTR we use the input port scheme since this is the<br />

program that computes the output port). We map instructions operating on<br />

packets <strong>de</strong>stined <strong>for</strong> port 0 and 1 to 2 and 3 to and so on. This type<br />

of mapping is clearly not optimal and better results can be expected if other<br />

characteristics of the network traffic are exploited. Since most traffic traces<br />

are anonymized, this kind of analysis is difficult to carry out and we do not<br />

explore this <strong>de</strong>sign space. Figure 27-3 shows the speedup results due to flow<br />

aggregation <strong>for</strong> FRAG and RTR programs. Flow aggregation is capable of<br />

uncovering significant amount of IR even when smaller RB’s are used (this<br />

is highly <strong>de</strong>pen<strong>de</strong>nt on the input data). For example, <strong>for</strong> the FRAG program,<br />

five RB’s with (128,8) configuration results in the same speedup as a single<br />

RB with a (1024,8) configuration. We carried out experiments with other traces<br />

and obtained varying amounts of IR and speedup. While IR invariably<br />

increases due to the flow aggregation scheme, speedup, being <strong>de</strong>pen<strong>de</strong>nt on<br />

other factors (see section 3.1), shows little or no improvement in many cases.<br />

The solid lines represent speedup due to ALU instructions in the base case<br />

while the dotted lines show results due to the flow-based scheme <strong>for</strong> ALU<br />

instructions only. The dashed lines indicate the additional contribution ma<strong>de</strong><br />

by load instructions to the overall speedup. To examine the effect of reducing<br />

resource contention on speedup, we tried the exten<strong>de</strong>d configuration and<br />

obtained a 4.8% improvement in speedup <strong>for</strong> RTR (2.3% <strong>for</strong> FRAG) over<br />

the flow-based scheme (with (32,8) RB). Determining the IR and speedup due<br />

to flow aggregation <strong>for</strong> payload processing applications is rather difficult since

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!