13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESIn these cases, Intel Core Microarchitecture provides a 2 μop flow from decoder 0,resulting in a slight loss of decode b<strong>and</strong>width since 2 μop flow must be steered todecoder 0 from the decoder with which it was aligned.RIP addressing may be common in accessing global data. Since it will not benefitfrom micro-fusion, compiler may consider accessing global data with other means ofmemory addressing.3.4.2.2 Optimizing for Macro-fusionMacro-fusion merges two instructions to a single μop. Intel Core Microarchitectureperforms this hardware optimization under limited circumstances.The first instruction of the macro-fused pair must be a CMP or TEST instruction. Thisinstruction can be REG-REG, REG-IMM, or a micro-fused REG-MEM comparison. Thesecond instruction (adjacent in the instruction stream) should be a conditionalbranch.Since these pairs are common ingredient in basic iterative programming sequences,macro-fusion improves performance even on un-recompiled binaries. All of thedecoders can decode one macro-fused pair per cycle, with up to three other instructions,resulting in a peak decode b<strong>and</strong>width of 5 instructions per cycle.Each macro-fused instruction executes with a single dispatch. This process reduceslatency, which in this case shows up as a cycle removed from branch mispredictedpenalty. Software also gain all other fusion benefits: increased rename <strong>and</strong> retireb<strong>and</strong>width, more storage for instructions in-flight, <strong>and</strong> power savings from representingmore work in fewer bits.The following list details when you can use macro-fusion:• CMP or TEST can be fused when comparing:REG-REG. For example: CMP EAX,ECX; JZ labelREG-IMM. For example: CMP EAX,0x80; JZ labelREG-MEM. For example: CMP EAX,[ECX]; JZ labelMEM-REG. For example: CMP [EAX],ECX; JZ label• TEST can fused with all conditional jumps.• CMP can be fused with only the following conditional jumps. These conditionaljumps check carry flag (CF) or zero flag (ZF). jump. The list of macro-fusioncapableconditional jumps are:JA or JNBEJAE or JNB or JNCJE or JZJNA or JBEJNAE or JC or JBJNE or JNZ3-18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!