15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

uffer can receive up to three results per cycle and can start retirement of up to three uops per cycle.<br />

Retirement requires three cycles. Thus the overall pipeline has some 14 stages; but, because some of these<br />

stages can overlap, the effect is a minimum latency of 12 cycles per instruction.<br />

The Pentium 4 is a redesign of the core microarchitecture. The translation of IA-32 instructions into<br />

uops is retained, but instead of repeatedly fetching, decoding, and translating recurring IA-32 instruction<br />

sequences, the uops are stored in a trace cache for repeated access. The trace cache can hold up to 12 K<br />

uops, and in a manner somewhat similar to the PowerPC 750 branch elimination logic, the trace cache<br />

stores frequently-traversed sequences (i.e., “traces”) of uops with any predict-taken branches followed by<br />

instructions from the predicted path. The trace cache can provide up to three uops per cycle, which are<br />

then routed through reorder-buffer allocation logic, register-renaming logic, and then into uop queues for<br />

scheduling. Up to six uops can be issued per cycle, and up to three uops can be retired per cycle. Part of<br />

the aggressiveness of the design can be seen by the increase in the reorder buffer size from 40 entries in the<br />

P6 core to 126 entries for the Pentium 4. The clock rate can also be aggressively increased on the Pentium 4,<br />

since there are approximately double the number of pipeline stages in it as compared to the P6 core. By<br />

cascading ALUs, two dependent addition or subtraction operations can be performed in each cycle.<br />

References<br />

1. Johnson, M., Superscalar Microprocessor Design, Prentice-Hall, Englewood Cliffs, NJ, 1991.<br />

2. Tjaden, G., and Flynn, M., Detection of parallel execution of independent instructions, IEEE Trans.<br />

Computers, C-19, 889, 1970.<br />

3. Riseman, E., and Foster, C., The inhibition of potential parallelism by conditional jumps, IEEE Trans.<br />

Computers, C-21, 1405, 1972.<br />

4. Nicolau, A., and Fisher, J., Measuring the parallelism available for very long instruction word architectures,<br />

IEEE Trans. Computers, C-33, 968, 1984.<br />

5. Hinton, G. et al., The microarchitecture of the Pentium 4 processor, Intel Technology Journal, available<br />

on-line, 2001.<br />

6. Schorr, H., Design principles for a high-performance system, in Proc. Symp. Computers and Automata,<br />

New York, 1971, 165.<br />

7. Agerwala, T., and Cocke, J., High performance reduced instruction set processors, Technical Report<br />

RC12434, IBM Thomas Watson Research Center, 1987.<br />

8. Tremblay, M., Greenly, D., and Normoyle, K., The design of the microarchitecture of the UltraSPARC-I,<br />

Proc. IEEE, 83, 1653, 1995.<br />

9. Kennedy, A., et al., A G3 PowerPC superscalar low-power microprocessor, in Proc. COMPCON, San<br />

Francisco, 1997, 315.<br />

1<br />

6.2 Register Renaming Techniques<br />

Dezsö Sima<br />

Introduction<br />

Register renaming (or “renaming” for short) is a widely used technique in instruction level processors<br />

(ILP) to remove false data dependencies between register operands of subsequent instructions in a straight<br />

1–3<br />

line code sequence. As false data dependencies we designate read-after-write (RAW) and write-afterwrite<br />

(WAW) dependencies. If false data dependencies are removed, no related precedence requirements<br />

constrain the execution sequence of the instructions involved. Thus, on an average, more instructions<br />

are available for parallel execution per cycle, which increases processor performance.<br />

1<br />

Portions of this chapter reprinted with permission from Sima, D., The design space of register renaming techniques,<br />

IEEE Micro, 20 Sept./Oct., 70, 2000. © 2000 IEEE<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!