13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

INTEL® <strong>64</strong> AND <strong>IA</strong>-<strong>32</strong> PROCESSOR ARCHITECTURESThe Stack Pointer Tracker moves all these implicit RSP updates to logic contained inthe decoders themselves. The feature provides the following benefits:• Improves decode b<strong>and</strong>width, as PUSH, POP <strong>and</strong> RET are single μop instructionsin Intel Core microarchitecture.• Conserves execution b<strong>and</strong>width as the RSP updates do not compete for executionresources.• Improves parallelism in the out of order execution engine as the implicit serialdependencies between μops are removed.• Improves power efficiency as the RSP updates are carried out on small, dedicatedhardware.2.1.2.6 Micro-fusionMicro-fusion fuses multiple μops from the same instruction into a single complexμop. The complex μop is dispatched in the out-of-order execution core. Micro-fusionprovides the following performance advantages:• Improves instruction b<strong>and</strong>width delivered from decode to retirement.• Reduces power consumption as the complex μop represents more work in asmaller format (in terms of bit density), reducing overall “bit-toggling” in themachine for a given amount of work <strong>and</strong> virtually increasing the amount ofstorage in the out-of-order execution engine.Many instructions provide register flavors <strong>and</strong> memory flavors. The flavor involving amemory oper<strong>and</strong> will decodes into a longer flow of μops than the register version.Micro-fusion enables software to use memory to register operations to express theactual program behavior without worrying about a loss of decode b<strong>and</strong>width.2.1.3 Execution CoreThe execution core of the Intel Core microarchitecture is superscalar <strong>and</strong> can processinstructions out of order. When a dependency chain causes the machine to wait for aresource (such as a second-level data cache line), the execution core executes otherinstructions. This increases the overall rate of instructions executed per cycle (IPC).The execution core contains the following three major components:• Renamer — Moves μops from the front end to the execution core. Architecturalregisters are renamed to a larger set of microarchitectural registers. Renamingeliminates false dependencies known as read-after-read <strong>and</strong> write-after-readhazards.• Reorder buffer (ROB) — Holds μops in various stages of completion, bufferscompleted μops, updates the architectural state in order, <strong>and</strong> manages orderingof exceptions. The ROB has 96 entries to h<strong>and</strong>le instructions in flight.2-9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!