13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

INTEL® <strong>64</strong> AND <strong>IA</strong>-<strong>32</strong> PROCESSOR ARCHITECTURES2.1.2 Front EndThe front ends needs to supply decoded instructions (μops) <strong>and</strong> sustain the streamto a six-issue wide out-of-order engine. The components of the front end, their functions,<strong>and</strong> the performance challenges to microarchitectural design are described inTable 2-1.Table 2-1. Components of the Front EndComponent Functions Performance ChallengesBranch PredictionUnit (BPU)• Helps the instruction fetch unitfetch the most likely instructionto be executed by predictingthe various branch types:conditional, indirect, direct, call,<strong>and</strong> return. Uses dedicatedhardware for each type.Instruction FetchUnitInstruction Queue<strong>and</strong> Decode Unit• Prefetches instructions that arelikely to be executed• Caches frequently-usedinstructions• Predecodes <strong>and</strong> buffersinstructions, maintaining aconstant b<strong>and</strong>width despiteirregularities in the instructionstream• Decodes up to four instructions,or up to five with macro-fusion• Stack pointer tracker algorithmfor efficient procedure entry<strong>and</strong> exit• Implements the Macro-Fusionfeature, providing higherperformance <strong>and</strong> efficiency• The Instruction Queue is alsoused as a loop cache, enablingsome loops to be executed withboth higher b<strong>and</strong>width <strong>and</strong>lower power• Enables speculativeexecution.• Improves speculativeexecution efficiency byreducing the amount ofcode in the “non-architectedpath” 1 to be fetched intothe pipeline.• Variable length instructionformat causes unevenness(bubbles) in decodeb<strong>and</strong>width.• Taken branches <strong>and</strong>misaligned targets causesdisruptions in the overallb<strong>and</strong>width delivered by thefetch unit.• Varying amounts of workper instruction requiresexpansion into variablenumbers of μops.• Prefix adds a dimension ofdecoding complexity.• Length Changing Prefix(LCP) can cause front endbubbles.NOTES:1. Code paths that the processor thought it should execute but then found out it should go inanother path <strong>and</strong> therefore reverted from its initial intention.2-5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!