13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESThe instruction MOV DX, 01234h is subject to LCP stalls in processors based on IntelCore microarchitecture, <strong>and</strong> in Intel Core Duo <strong>and</strong> Intel Core Solo processors.Instructions that contain imm16 as part of their fixed encoding but do not require LCPto change the immediate size are not subject to LCP stalls. The REX prefix (4xh) in<strong>64</strong>-bit mode can change the size of two classes of instruction, but does not cause anLCP penalty.If the LCP stall happens in a tight loop, it can cause significant performance degradation.When decoding is not a bottleneck, as in floating-point heavy code, isolated LCPstalls usually do not cause performance degradation.Assembly/Compiler Coding Rule 21. (MH impact, MH generality) Favorgenerating code using imm8 or imm<strong>32</strong> values instead of imm16 values.If imm16 is needed, load equivalent imm<strong>32</strong> into a register <strong>and</strong> use the word value inthe register instead.Double LCP StallsInstructions that are subject to LCP stalls <strong>and</strong> cross a 16-byte fetch line boundary cancause the LCP stall to trigger twice. The following alignment situations can cause LCPstalls to trigger twice:• An instruction is encoded with a MODR/M <strong>and</strong> SIB byte, <strong>and</strong> the fetch lineboundary crossing is between the MODR/M <strong>and</strong> the SIB bytes.• An instruction starts at offset 13 of a fetch line references a memory locationusing register <strong>and</strong> immediate byte offset addressing mode.The first stall is for the 1st fetch line, <strong>and</strong> the 2nd stall is for the 2nd fetch line. Adouble LCP stall causes a decode penalty of 11 cycles.The following examples cause LCP stall once, regardless of their fetch-line location ofthe first byte of the instruction:ADD DX, 01234HADD word ptr [EDX], 01234HADD word ptr 012345678H[EDX], 01234HADD word ptr [012345678H], 01234HThe following instructions cause a double LCP stall when starting at offset 13 of afetch line:ADD word ptr [EDX+ESI], 01234HADD word ptr 012H[EDX], 01234HADD word ptr 012345678H[EDX+ESI], 01234HTo avoid double LCP stalls, do not use instructions subject to LCP stalls that use SIBbyte encoding or addressing mode with byte displacement.False LCP StallsFalse LCP stalls have the same characteristics as LCP stalls, but occur on instructionsthat do not have any imm16 value.3-22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!