13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESUsing LEA in this way may avoid register usage by not tying up registers for oper<strong>and</strong>sof arithmetic instructions. This use may also save code space.If the LEA instruction uses a shift by a constant amount then the latency of thesequence of µops is shorter if adds are used instead of a shift, <strong>and</strong> the LEA instructionmay be replaced with an appropriate sequence of µops. This, however, increases thetotal number of µops, leading to a trade-off.Assembly/Compiler Coding Rule 33. (ML impact, L generality) If an LEAinstruction using the scaled index is on the critical path, a sequence with ADDs maybe better. If code density <strong>and</strong> b<strong>and</strong>width out of the trace cache are the criticalfactor, then use the LEA instruction.3.5.1.4 Using SHIFT <strong>and</strong> ROTATEThe SHIFT <strong>and</strong> ROTATE instructions have a longer latency on processor with a CPUIDsignature corresponding to family 15 <strong>and</strong> model encoding of 0, 1, or 2. The latency ofa sequence of adds will be shorter for left shifts of three or less. Fixed <strong>and</strong> variableSHIFTs have the same latency.The rotate by immediate <strong>and</strong> rotate by register instructions are more expensive thana shift. The rotate by 1 instruction has the same latency as a shift.Assembly/Compiler Coding Rule 34. (ML impact, L generality) Avoid ROTATEby register or ROTATE by immediate instructions. If possible, replace with aROTATE by 1 instruction.3.5.1.5 Address CalculationsFor computing addresses, use the addressing modes rather than general-purposecomputations. Internally, memory reference instructions can have four oper<strong>and</strong>s:• Relocatable load-time constant• Immediate constant• Base register• Scaled index registerIn the segmented model, a segment register may constitute an additional oper<strong>and</strong> inthe linear address calculation. In many cases, several integer instructions can beeliminated by fully using the oper<strong>and</strong>s of memory references.3.5.1.6 Clearing Registers <strong>and</strong> Dependency Breaking IdiomsCode sequences that modifies partial register can experience some delay in itsdependency chain, but can be avoided by using dependency breaking idioms.In processors based on Intel Core microarchitecture, a number of instructions canhelp clear execution dependency when software uses these instruction to clearregister content to zero. The instructions include3-27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!