13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>64</strong>-BIT MODE CODING GUIDELINESAssembly/Compiler Coding Rule 65. (H impact, M generality) Use the <strong>32</strong>-bitversions of instructions in <strong>64</strong>-bit mode to reduce code size unless the <strong>64</strong>-bit versionis necessary to access <strong>64</strong>-bit data or additional registers.9.2.2 Use Extra Registers to Reduce Register Pressure<strong>64</strong>-bit mode makes 8 additional <strong>64</strong>-bit general purpose registers <strong>and</strong> 8 additionalXMM registers available to applications. To access the additional registers, a singlebyte REX prefix is necessary. Using 8 additional registers can prevent the compilerfrom needing to spill values onto the stack.Note that the potential increase in code size, due to the REX prefix, can increasecache misses. This can work against the benefit of using extra registers to access thedata. When eight registers are sufficient for an algorithm, don’t use the registers thatrequire an REX prefix. This keeps the code size smaller.Assembly/Compiler Coding Rule 66. (M impact, MH generality) When theyare needed to reduce register pressure, use the 8 extra general purpose registersfor integer code <strong>and</strong> 8 extra XMM registers for floating-point or SIMD code.9.2.3 Use <strong>64</strong>-Bit by <strong>64</strong>-Bit Multiplies To Produce128-Bit Results Only When NecessaryInteger multiplies of <strong>64</strong>-bit by <strong>64</strong>-bit oper<strong>and</strong>s that produce a 128-bit result costmore than multiplies that produce a <strong>64</strong>-bit result. The upper <strong>64</strong>-bits of a result takelonger to compute than the lower <strong>64</strong> bits.If the compiler can determine at compile time that the result of a multiply will notexceed <strong>64</strong> bits, then the compiler should generate the multiply instruction thatproduces a <strong>64</strong>-bit result. If the compiler or assembly programmer can not determinethat the result will be less than <strong>64</strong> bits, then a multiply that produces a 128-bit resultis necessary.Assembly/Compiler Coding Rule 67. (ML impact, M generality) Prefer <strong>64</strong>-bitby <strong>64</strong>-bit integer multiplies that produce <strong>64</strong>-bit results over multiplies that produce128-bit results.9.2.4 Sign Extension to Full <strong>64</strong>-BitsWhen in <strong>64</strong>-bit mode, the architecture is optimized to sign-extend to <strong>64</strong> bits in asingle μop. In <strong>64</strong>-bit mode, when the destination is <strong>32</strong> bits, the upper <strong>32</strong> bits must bezeroed.Zeroing the upper <strong>32</strong> bits requires an extra μop <strong>and</strong> is less optimal than signextending to <strong>64</strong> bits. While sign extending to <strong>64</strong> bits makes the instruction one bytelonger, it reduces the number of μops that the trace cache has to store, improvingperformance.9-2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!