15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

37. MediaBench benchmarks, http://www.cs.ucla.edu/~leec/mediabench/<br />

38. EEMBC, www.eembc.org<br />

39. BDTI, http://www.bdti.com/<br />

40. The Caffeine benchmarks, http://www.pendragon-software.com/pendragon/cm3<br />

41. VolanoMark, http://www.volano.com/benchmarks.html<br />

42. Transactions Processing Council, www.tpc.org<br />

43. SYSMARK, http://www.bapco.com/<br />

44. Ziff Davis Benchmarks, www.zdbop.com or www.zdnet.com/etestinglabs/filters/benchmarks<br />

45. PC Benchmarks, www.pcbenchmarks.com<br />

46. The Jaba profiling tool, http://www.ece.utexas.edu/projects/ece/lca/jaba.html<br />

47. R. Radhakrishnan, J. Rubio, and L. K. John, ‘‘Characterization of Java Applications at Bytecode and<br />

Ultra-SPARC Machine Code Levels,’’ Proceedings of IEEE International Conference on Computer<br />

Design, pp. 281–284.<br />

48. J. A. Mathew, P. D. Coddington, and K. A. Hawick, ‘‘Analysis and Development of the Java Grande<br />

Benchmarks,’’ Proceedings of the ACM 1999 Java Grande Conference, June 1999.<br />

49. C. Lee, M. Potkonjak, and W. H. M. Smith, “MediaBench: A Tool for Evaluating and Synthesizing<br />

Multimedia and Communication Systems,” Proceedings of the 30th International Symposium on<br />

Microarchitecture, pp. 330–335.<br />

50. D. Bhandarkar and J. Ding, “Performance Characterization of the Pentium Pro Processor,” Proceedings<br />

of the 3rd High Performance Computer Architecture Symposium, pp. 288–297, 1997.<br />

51. Ted Romer, Geoff Voelker, Dennis Lee, Alec Wolman, Wayne Wong, Hank Levy, Brian Bershad, and<br />

Brad Chen, “Instrumentation and Optimization of Win32/Intel Executables Using Etch,” USENIX,<br />

1997.<br />

52. T. M. Conte and C. E. Gimarc, Fast Simulation of Computer Architectures, Kluwer Academic Publishers,<br />

Dordrecht, the Netherlands, 1995.<br />

8.3 Trace Caching and Trace Processors<br />

Eric Rotenberg<br />

A superscalar processor executes multiple instructions in parallel each cycle. Because there are data dependences<br />

among instructions, finding multiple independent instructions that can execute in parallel requires<br />

examining an even larger group of instructions, called the instruction window. Figure 8.16 shows a high-level<br />

view of a superscalar processor, including instruction buffers that make up the window and the decoupled<br />

fetch and execution engines. The fetch engine predicts branches, fetches and renames instructions, and<br />

dispatches them into the window. Meanwhile, each cycle, the execution engine identifies instructions in the<br />

window whose operands are available, and issues them to parallel functional units (FUs).<br />

Peak performance is increased by adding more parallel functional units. But adding more functional<br />

units has ramifications for other parts of the processor. First, instruction fetch bandwidth must be<br />

commensurate with peak execution bandwidth. Second, the window must be correspondingly larger. A<br />

larger window enables the processor to probe deeper into the dynamic instruction stream, increasing the<br />

chance of finding enough independent instructions each cycle to keep functional units operating at peak<br />

efficiency.<br />

Next-generation, high-performance processors will need to issue 8, 12, or even 16 instructions per<br />

cycle. Unfortunately, at high issue rates, supporting mechanisms—instruction supply and the instruction<br />

window—are difficult to scale. This chapter section deals with the instruction fetch bottleneck and inefficient<br />

execution mechanisms, and surveys a next-generation microarchitecture, the trace processor [21,24,27,29,<br />

31], that attacks these problems. A third problem, control and data dependence bottlenecks, is also covered;<br />

however, because this aspect is more involved, it is left to the reader to investigate the trace processor<br />

literature [24,25].<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!