29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Enhancing Speedup in Network Processing Applications 365<br />

2.2. Impact of IR on resources<br />

Since the result value of a reused instruction is available from the RB, the<br />

execution phase is avoi<strong>de</strong>d reducing the <strong>de</strong>mand <strong>for</strong> resources. In case of load<br />

instructions, reuse (outcome of load being reused) reduces port contention,<br />

number of memory accesses [4] and bus transitions. In most current day<br />

processors that have hardware support <strong>for</strong> gating, this could lead to significant<br />

savings in energy in certain parts of the processor logic. This could be<br />

significant in in-or<strong>de</strong>r issue processors, where instructions <strong>de</strong>pen<strong>de</strong>nt on<br />

reused instructions cannot be issued earlier. In dynamically scheduled processors,<br />

IR allows <strong>de</strong>pen<strong>de</strong>nt instructions to execute early which changes the<br />

schedule of instruction execution resulting in clustering or spreading of request<br />

<strong>for</strong> resources, thereby, increasing or <strong>de</strong>creasing resource contention [2].<br />

Resource contention is <strong>de</strong>fined as the ratio of the number of times resources<br />

are not available <strong>for</strong> executing ready instructions to the total number of<br />

requests ma<strong>de</strong> <strong>for</strong> resources.<br />

2.3. Operand based in<strong>de</strong>xing<br />

In<strong>de</strong>xing the RB with the PC enables one to exploit reuse due to dynamic<br />

instances of the same static instruction. In<strong>de</strong>xing the RB with the instruction<br />

opco<strong>de</strong> and operands can capture redundancy across dynamic instances of<br />

statically distinct instructions (having the same opco<strong>de</strong>). One way to implement<br />

operand in<strong>de</strong>xing (though not optimal) is to have an opco<strong>de</strong> field in<br />

addition to other fields mentioned previously in the RB and search in parallel<br />

the entire RB <strong>for</strong> matches. In other words, an instruction that matches in the<br />

opco<strong>de</strong> and operand fields can read the result value from the RB. This is in<br />

contrast to PC based in<strong>de</strong>xing where the associative search is limited to a<br />

portion of the RB. We use a method similar to the one proposed in [5] to<br />

evaluate operand based in<strong>de</strong>xing. Operand in<strong>de</strong>xing helps in uncovering<br />

slightly more reuse than PC based in<strong>de</strong>xing (<strong>for</strong> the same RB size). This can<br />

be attributed to the fact that in many packet-processing applications, certain<br />

tasks are often repeated <strong>for</strong> every packet. For example, IP hea<strong>de</strong>r checksum<br />

computation is carried out to verify a packet when it first arrives at a router.<br />

When packet fragmentation occurs, the IP hea<strong>de</strong>r is copied to the newly<br />

created packets and the checksum is again computed over each new packet.<br />

Since there is significant correlation in packet data, the inputs over which<br />

processing is done is quite limited (in the above example, the checksum is<br />

repeatedly computed over nearly the same IP hea<strong>de</strong>r) and hence packet<br />

(especially hea<strong>de</strong>r) processing applications tend to reuse results that were<br />

obtained while processing a previous packet.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!