29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

370 Chapter 27<br />

improve IR thereby reducing the RoB occupancy even more. Flow based reuse<br />

reduces the occupancy of the RoB by 1.5% to 3% over the base reuse scheme.<br />

4. CONCLUSIONS<br />

In this article we examined instruction reuse of integer ALU and load instructions<br />

in packet processing applications. To further enhance the utility of reuse<br />

by reducing interference, a flow aggregation scheme that exploits packet correlation<br />

and uses multiple RB’s is proposed. The results indicate that<br />

exploiting instruction reuse with flow aggregation does significantly improve<br />

the per<strong>for</strong>mance of our NPU mo<strong>de</strong>l (24% improvement in speedup in the<br />

best case). IR and speedup that can be achieved <strong>de</strong>pends on the nature of the<br />

network traffic and working set sizes. A direct relationship between reuse and<br />

speedup does not exist. This means that although instruction reuse improves<br />

due to flow aggregation, speedup may not improve proportionally. This is<br />

because speedup <strong>de</strong>pends on a lot of other parameters such as the criticality<br />

of the instruction being reused. Future work would be to simulate the architecture<br />

proposed, evaluate energy issues and <strong>de</strong>termine which instructions<br />

really need to be present in the RB. Interference in the RB can be reduced if<br />

critical instructions are i<strong>de</strong>ntified and placed in the RB. Further, a <strong>de</strong>tailed<br />

exploration of various mapping schemes is necessary to evenly distribute<br />

related data among RB’s. Finally, we believe that additional reuse can be<br />

recovered from payload processing applications if realistic (non-anonymized)<br />

traffic is used and operand in<strong>de</strong>xing is exploited.<br />

REFERENCES<br />

1.<br />

2.<br />

3.<br />

4.<br />

5.<br />

6.<br />

7.<br />

8.<br />

9.<br />

A. Sodani and G. Sohi. “Dynamic Instruction Reuse.” 24th Annual International Symposium<br />

on Computer Architecture, July 1997, pp. 194–205.<br />

A. Sodani and G. Sohi. “Un<strong>de</strong>rstanding the Differences between Value Prediction and<br />

Instruction Reuse.” 32nd Annual International Symposium on Microarchiteclure, December<br />

1998, pp. 205–215.<br />

A. Sodani and G. Sohi. “An Empirical Analysis of Instruction Repetition.” In Proceedings<br />

of ASPLOS-8, 1998.<br />

J. Yang and R. Gupta, “Load Redundancy Removal through Instruction Reuse.” In<br />

Proceedings of International Conference on Parallel Processing, August 2000, pp. 61–68.<br />

C. Molina, A. Gonzalez and J. Tubella. “Dynamic Removal of Redundant Computations.”<br />

In Proceedings of International Conference on Supercomputing, June 1999.<br />

F. Baker. “Requirements <strong>for</strong> IP Version 4 Routers,” RFC – 1812, Network Working Group,<br />

June 1995.<br />

Intel IXP1200 Network Processor – Hardware Reference Manual, Intel Corporation,<br />

December 2001.<br />

D. Burger, T. M. Austin, and S. Bennett. “Evaluating Future Microprocessors: The<br />

SimpleScalar Tool Set.” Technical Report CS-TR-96-1308, University of Wisconsin-<br />

Madison, July 1996.<br />

Tilman Wolf and, Mark Franklin. “CommBench – A Telecommunications Benchmark <strong>for</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!