13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESXOR REG, REGSUB REG, REGXORPS/PD XMMREG, XMMREGPXOR XMMREG, XMMREGSUBPS/PD XMMREG, XMMREGPSUBB/W/D/Q XMMREG, XMMREGIn Intel Core Solo <strong>and</strong> Intel Core Duo processors, the XOR, SUB, XORPS, or PXORinstructions can be used to clear execution dependencies on the zero evaluation ofthe destination register.The Pentium 4 processor provides special support for XOR, SUB, <strong>and</strong> PXOR operationswhen executed within the same register. This recognizes that clearing a registerdoes not depend on the old value of the register. The XORPS <strong>and</strong> XORPD instructionsdo not have this special support. They cannot be used to break dependence chains.Assembly/Compiler Coding Rule 35. (M impact, ML generality) Usedependency-breaking-idiom instructions to set a register to 0, or to break a falsedependence chain resulting from re-use of registers. In contexts where thecondition codes must be preserved, move 0 into the register instead. This requiresmore code space than using XOR <strong>and</strong> SUB, but avoids setting the condition codes.Example 3-16 of using pxor to break dependency idiom on a XMM register whenperforming negation on the elements of an array.int a[4096], b[4096], c[4096];For ( int i = 0; i < 4096; i++ )C[i] = - ( a[i] + b[i] );Example 3-16. Clearing Register to Break Dependency While Negating Array ElementsNegation (-x = (x XOR (-1)) - (-1) withoutbreaking dependencyLea eax, alea ecx, blea edi, cxor edx, edxmovdqa xmm7, allonelp:movdqa xmm0, [eax + edx]paddd xmm0, [ecx + edx]pxor xmm0, xmm7psubd xmm0, xmm7movdqa [edi + edx], xmm0add edx, 16cmp edx, 4096jl lpNegation (-x = 0 -x) using PXOR reg, reg breaksdependencylea eax, alea ecx, blea edi, cxor edx, edxlp:movdqa xmm0, [eax + edx]paddd xmm0, [ecx + edx]pxor xmm7, xmm7psubd xmm7, xmm0movdqa [edi + edx], xmm7add edx,16cmp edx, 4096jl lp3-28

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!