29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Enhancing Speedup in Network Processing Applications 363<br />

Match (LPM) algorithm and is computed <strong>for</strong> every packet irrespective of<br />

whether IR is exploited or not [6]. Most routers employ route caches to<br />

minimize computing the LPM <strong>for</strong> every packet thereby ensuring that the<br />

output port is known very early [6], Classification of flows based on the input<br />

port involves little or no computation (since the input port through which a<br />

packet arrives is known) but uncovers a smaller percentage of reuse <strong>for</strong><br />

some applications. In case of routers that have a large number of ports, a<br />

many-to-one mapping between ports and RB’s to be queried by instructions<br />

is necessary to obtain good results.<br />

2.1. Proposed architecture<br />

For single processor single threa<strong>de</strong>d systems the architecture proposed in [1]<br />

with a few minor changes is sufficient to exploit flow-based IR. However,<br />

<strong>for</strong> multiprocessor and multithrea<strong>de</strong>d systems, which are generally used in<br />

<strong>de</strong>signing NPUs, extra modifications are required. The essence of the problem<br />

at hand is to <strong>de</strong>termine the appropriate RB to be queried by instructions<br />

operating on a packet and switch between RB’s when necessary. The NPU is<br />

essentially a chip multiprocessor consisting of multiple RISC processors (or<br />

micro-engines – using the terminology of Intel IXP1200 [7]) with support<br />

<strong>for</strong> hardware multithreading. It is the job of the programmer to partition tasks<br />

across threads as well as programs across micro-engines. The inter-thread and<br />

inter-processor communication is explicitly managed by the user. Figure<br />

27-2 shows the microarchitecture of a single processing engine. Each<br />

processor has a RB array consisting of N + 1<br />

is<br />

the <strong>de</strong>fault RB that is queried by instructions be<strong>for</strong>e the flow id of a packet<br />

is computed (<strong>for</strong> output port-based scheme only). The NPU also consists of<br />

a Flow i<strong>de</strong>ntifier table that is updated by the flow id <strong>for</strong> a given packet

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!