13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONSIntel Core microarchitecture. In Intel Core Duo <strong>and</strong> Intel Core Solo processors, softwareshould use scalar SSE2 instructions to implement double-precision complexmultiplication. This is because the data path between SIMD execution units is 128bits in Intel Core microarchitecture, <strong>and</strong> only <strong>64</strong> bits in previous microarchitectures.Example 6-13 shows two equivalent implementations of double-precision complexmultiply of two pair of complex numbers using vector SSE2 versus SSE3 instructions.Example 6-14 shows the equivalent scalar SSE2 implementation.Example 6-13. Double-Precision Complex Multiplication of Two PairsSSE2 Vector ImplementationSSE3 Vector Implementationmovapd xmm0, [eax] ;y xmovapd xmm1, [eax+16] ;w zunpcklpd xmm1, xmm1 ;z zmovapd xmm2, [eax+16] ;w zunpckhpd xmm2, xmm2 ;w wmulpd xmm1, xmm0 ;z*y z*xmulpd xmm2, xmm0 ;w*y w*xxorpd xmm2, xmm7 ;-w*y +w*xshufpd xmm2, xmm2,1 ;w*x -w*yaddpd xmm2, xmm1 ;z*y+w*x z*x-w*ymovapd [ecx], xmm2movapd xmm0, [eax] ;y xmovapd xmm1, [eax+16] ;z zmovapd xmm2, xmm1unpcklpd xmm1, xmm1unpckhpd xmm2, xmm2mulpd xmm1, xmm0 ;z*y z*xmulpd xmm2, xmm0 ;w*y w*xshufpd xmm2, xmm2, 1 ;w*x w*yaddsubpd xmm1, xmm2 ;w*x+z*y z*x-w*ymovapd [ecx], xmm1Example 6-14. Double-Precision Complex Multiplication Using Scalar SSE2movsd xmm0, [eax] ;xmovsd xmm5, [eax+8] ;ymovsd xmm1, [eax+16] ;zmovsd xmm2, [eax+24] ;wmovsd xmm3, xmm1 ;zmovsd xmm4, xmm2 ;wmulsd xmm1, xmm0 ;z*xmulsd xmm2, xmm0 ;w*xmulsd xmm3, xmm5 ;z*ymulsd xmm4, xmm5 ;w*ysubsd xmm1, xmm4 ;z*x - w*yaddsd xmm3, xmm2 ;z*y + w*xmovsd [ecx], xmm1movsd [ecx+8], xmm36-20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!