29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 27<br />

ENHANCING SPEEDUP IN NETWORK<br />

PROCESSING APPLICATIONS BY<br />

EXPLOITING INSTRUCTION REUSE WITH<br />

FLOW AGGREGATION<br />

G. Surendra, Subhasis Banerjee, and S. K. Nandy<br />

Indian Institute of Science<br />

Abstract. Instruction Reuse (IR) is a microarchitectural technique that improves the execution<br />

time of a program by removing redundant computations at run-time. Although this is the job of<br />

an optimizing compiler, they do not succeed many a time due to limited knowledge of run-time<br />

data. In this article we concentrate on integer ALU and load instructions in packet processing<br />

applications and see how IR can be used to obtain better per<strong>for</strong>mance. In addition, we attempt<br />

to answer the following questions in the article – (1) Can IR be improved by reducing interference<br />

in the reuse buffer, (2) What characteristics of network applications can be exploited<br />

to improve IR, and (3) What is the effect of IR on resource contention and memory accesses<br />

We propose an aggregation scheme that combines the high-level concept of network traffic i.e.<br />

“flows” with the low level microarchitectural feature of programs i.e. repetition of instructions<br />

and data and propose an architecture that exploits temporal locality in incoming packet data to<br />

improve IR by reducing interference in the RB. We find that the benefits that can be achieved<br />

by exploiting IR varies wi<strong>de</strong>ly <strong>de</strong>pending on the nature of the application and input data. For<br />

the benchmarks consi<strong>de</strong>red, we find that IR varies between 1% and 50% while the speedup<br />

achieved varies between 1% and 24%.<br />

Key words: network processors, instruction reuse, flows, and multithrea<strong>de</strong>d processors<br />

1. INTRODUCTION<br />

Network Processor Units (NPU) are specialized programmable engines that<br />

are optimized <strong>for</strong> per<strong>for</strong>ming communication and packet processing functions<br />

and are capable of supporting multiple standards and Quality of service (QoS)<br />

requirements. Increasing network speeds along with the increasing <strong>de</strong>sire to<br />

per<strong>for</strong>m more computation within the network have placed an enormous<br />

bur<strong>de</strong>n on the processing requirements of NPUs. This necessitates the<br />

<strong>de</strong>velopment of new schemes to speedup packet processing tasks while<br />

keeping up with the ever-increasing line rates. The above aim has to be<br />

achieved while keeping power requirements within reasonable limits. In this<br />

article we investigate dynamic Instruction Reuse (IR) as a means of improving<br />

the per<strong>for</strong>mance of a NPU. The motivation of this article is to <strong>de</strong>termine if<br />

IR is a viable option to be consi<strong>de</strong>red during the <strong>de</strong>sign of NPUs and to<br />

359<br />

A Jerraya et al. (eds.), <strong>Embed<strong>de</strong>d</strong> <strong>Software</strong> <strong>for</strong> SOC, 359–371, 2003.<br />

© 2003 Kluwer Aca<strong>de</strong>mic Publishers. Printed in the Netherlands.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!