15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

FIGURE 8.21 Analogy between a single instruction and a single trace.<br />

single-issue, out-of-order processor on the left-hand side. The unit of operation has changed from one<br />

instruction to one trace, but the pipeline bandwidth remains 1 unit per cycle.<br />

In essence, grouping instructions within traces is a reprieve. Complexity (cycle time) does not necessarily<br />

increase with each additional instruction added to a trace. Additional branches are absorbed by<br />

the trace cache and trace predictor, and additional source and destination operands are absorbed by<br />

handling data flow hierarchically. Also, complexity (cycle time) does not necessarily increase with one<br />

or two additional PEs. Hardware parallelism is allowed to expand incrementally—up to a point, at which<br />

time perhaps another level of hierarchy, and another reprieve, is needed.<br />

Perhaps the most important thing to remember about trace processors is that the whole processor<br />

contributes to parallelism, but cycle time is influenced more by an individual processing element than<br />

the whole processor.<br />

References<br />

single<br />

branch<br />

predictor<br />

1<br />

PC<br />

Simple<br />

Instruction<br />

Cache<br />

1<br />

rename<br />

1<br />

FU FU FU FU<br />

single<br />

trace<br />

predictor<br />

1<br />

Trace Id<br />

rename<br />

1<br />

PE PE PE PE<br />

1. M. Bohr. Interconnect Scaling—The Real Limiter to High Performance ULSI. 1995 International<br />

Electron Devices Meeting Technical Digest, pp. 241–244, 1995.<br />

2. T. Conte, K. Menezes, P. Mills, and B. Patel. Optimization of Instruction Fetch Mechanisms for High<br />

Issue Rates. 22nd International Symposium on Computer Architecture, pp. 333–344, June 1995.<br />

3. S. Dutta and M. Franklin. Control Flow Prediction with Tree-like Subgraphs for Superscalar Processors.<br />

28th International Symposium on Microarchitecture, pp. 258–263, November 1995.<br />

4. D. Friendly, S. Patel, and Y. Patt. Alternative Fetch and Issue Policies for the Trace Cache Fetch<br />

Mechanism. 30th International Symposium on Microarchitecture, pp. 24–33, December 1997.<br />

5. D. Friendly, S. Patel, and Y. Patt. Putting the Fill Unit to Work: Dynamic Optimizations for Trace<br />

Cache Microprocessors. 31st International Symposium on Microarchitecture, pp. 173–181, December<br />

1998.<br />

6. Q. Jacobson, E. Rotenberg, and J. E. Smith. Path-Based Next Trace Prediction. 30th International<br />

Symposium on Microarchitecture, pp. 14–23, December 1997.<br />

7. Q. Jacobson and J. E. Smith. Instruction Pre-Processing in Trace Processors. 5th International Symposium<br />

on High-Performance Computer Architecture, January 1999.<br />

8. J. Johnson. Expansion Caches for Superscalar Processors. Technical Report CSL-TR-94-630, Computer<br />

Systems Laboratory, Stanford University, June 1994.<br />

9. M. S. Lam and R. P. Wilson. Limits of Control Flow on Parallelism. 19th International Symposium<br />

on Computer Architecture, pp. 46–57, May 1992.<br />

10. M. Lipasti and J. Shen. Exceeding the Dataflow Limit via Value Prediction. 29th International<br />

Symposium on Microarchitecture, December 1996.<br />

© 2002 by CRC Press LLC<br />

Register<br />

File<br />

Trace<br />

Cache<br />

CDB global result bus<br />

1<br />

Global<br />

Register<br />

File

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!