13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESInstead, software should follow these additional decoder guidelines:• If you need to use multiple μop, non-microsequenced instructions, try toseparate by a few single μop instructions. The following instructions areexamples of multiple-μop instruction not requiring micro-sequencer:ADC/SBBCMOVccRead-modify-write instructions• If a series of multiple-μop instructions cannot be separated, try breaking theseries into a different equivalent instruction sequence. For example, a series ofread-modify-write instructions may go faster if sequenced as a series of readmodify+ store instructions. This strategy could improve performance even if thenew code sequence is larger than the original one.3.5.1.1 Use of the INC <strong>and</strong> DEC InstructionsThe INC <strong>and</strong> DEC instructions modify only a subset of the bits in the flag register. Thiscreates a dependence on all previous writes of the flag register. This is especiallyproblematic when these instructions are on the critical path because they are used tochange an address for a load on which many other instructions depend.Assembly/Compiler Coding Rule <strong>32</strong>. (M impact, H generality) INC <strong>and</strong> DECinstructions should be replaced with ADD or SUB instructions, because ADD <strong>and</strong>SUB overwrite all flags, whereas INC <strong>and</strong> DEC do not, therefore creating falsedependencies on earlier instructions that set the flags.3.5.1.2 Integer DivideTypically, an integer divide is preceded by a CWD or CDQ instruction. Depending onthe oper<strong>and</strong> size, divide instructions use DX:AX or EDX:EAX for the dividend. TheCWD or CDQ instructions sign-extend AX or EAX into DX or EDX, respectively. Theseinstructions have denser encoding than a shift <strong>and</strong> move would be, but they generatethe same number of micro-ops. If AX or EAX is known to be positive, replace theseinstructions with:xor dx, dxorxor edx, edx3.5.1.3 Using LEAIn some cases with processor based on Intel NetBurst microarchitecture, the LEAinstruction or a sequence of LEA, ADD, SUB <strong>and</strong> SHIFT instructions can replaceconstant multiply instructions. The LEA instruction can also be used as a multipleoper<strong>and</strong> addition instruction, for example:LEA ECX, [EAX + EBX + 4 + A]3-26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!