13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESAssembly/Compiler Coding Rule 36. (M impact, MH generality) Breakdependences on portions of registers between instructions by operating on <strong>32</strong>-bitregisters instead of partial registers. For moves, this can be accomplished with <strong>32</strong>-bitmoves or by using MOVZX.On Pentium M processors, the MOVSX <strong>and</strong> MOVZX instructions both take a singleμop, whether they move from a register or memory. On Pentium 4 processors, theMOVSX takes an additional μop. This is likely to cause less delay than the partialregister update problem mentioned above, but the performance gain may vary. If theadditional μop is a critical problem, MOVSX can sometimes be used as alternative.Sometimes sign-extended semantics can be maintained by zero-extending oper<strong>and</strong>s.For example, the C code in the following statements does not need sign extension,nor does it need prefixes for oper<strong>and</strong> size overrides:static short INT a, b;IF (a == b) {. . .}Code for comparing these 16-bit oper<strong>and</strong>s might be:MOVZW EAX, [a]MOVZW EBX, [b]CMP EAX, EBXThese circumstances tend to be common. However, the technique will not work if thecompare is for greater than, less than, greater than or equal, <strong>and</strong> so on, or if thevalues in eax or ebx are to be used in another operation where sign extension isrequired.Assembly/Compiler Coding Rule 37. (M impact, M generality) Try to use zeroextension or operate on <strong>32</strong>-bit oper<strong>and</strong>s instead of using moves with signextension.The trace cache can be packed more tightly when instructions with oper<strong>and</strong>s that canonly be represented as <strong>32</strong> bits are not adjacent.Assembly/Compiler Coding Rule 38. (ML impact, L generality) Avoid placinginstructions that use <strong>32</strong>-bit immediates which cannot be encoded as sign-extended16-bit immediates near each other. Try to schedule µops that have no immediateimmediately before or after µops with <strong>32</strong>-bit immediates.3.5.1.7 ComparesUse TEST when comparing a value in a register with zero. TEST essentially ANDsoper<strong>and</strong>s together without writing to a destination register. TEST is preferred overAND because AND produces an extra result register. TEST is better than CMP ..., 0because the instruction size is smaller.3-29

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!