13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>64</strong>-BIT MODE CODING GUIDELINESinstruction is not the optimal implementation if a <strong>64</strong>-bit result is desired. Use theextended registers.For example, the following code sequence loads the <strong>32</strong>-bit values sign-extended intothe <strong>64</strong>-bit registers <strong>and</strong> performs a multiply:movsx rax, DWORD PTR[x]movsx rcx, DWORD PTR[y]imul rax, rcxThe <strong>64</strong>-bit version above is more efficient than using the following <strong>32</strong>-bit version:mov eax, DWORD PTR[x]mov ecx, DWORD PTR[y]imul ecxIn the <strong>32</strong>-bit case above, EAX is required to be a source. The result ends up in theEDX:EAX pair instead of in a single <strong>64</strong>-bit register.Assembly/Compiler Coding Rule 69. (ML impact, M generality) Use the<strong>64</strong>-bit versions of multiply for <strong>32</strong>-bit integer multiplies that require a <strong>64</strong> bit result.To add two <strong>64</strong>-bit numbers in <strong>32</strong>-bit legacy mode, the add instruction followed by theaddc instruction is used. For example, to add two <strong>64</strong>-bit variables (X <strong>and</strong> Y), thefollowing four instructions could be used:mov eax, DWORD PTR[X]mov edx, DWORD PTR[X+4]add eax, DWORD PTR[Y]adc edx, DWORD PTR[Y+4]The result will end up in the two-register EDX:EAX.In <strong>64</strong>-bit mode, the above sequence can be reduced to the following:mov rax, QWORD PTR[X]add rax, QWORD PTR[Y]The result is stored in rax. One register is required instead of two.Assembly/Compiler Coding Rule 70. (ML impact, M generality) Use the<strong>64</strong>-bit versions of add for <strong>64</strong>-bit adds.9.3.2 CVTSI2SS <strong>and</strong> CVTSI2SDThe CVTSI2SS <strong>and</strong> CVTSI2SD instructions convert a signed integer in a generalpurposeregister or memory location to a single-precision or double-precisionfloating-point value. The signed integer can be either <strong>32</strong>-bits or <strong>64</strong>-bits.In processors based on Intel NetBurst microarchitecture, the <strong>32</strong>-bit version willexecute from the trace cache; the <strong>64</strong>-bit version will result in a microcode flow fromthe microcode ROM <strong>and</strong> takes longer to execute. In most cases, the <strong>32</strong>-bit versionsof CVTSI2SS <strong>and</strong> CVTSI2SD is sufficient.9-4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!