13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONSExample 6-11. Multiplication of Two Pair of Single-precision Complex Number (Contd.)mulps xmm2, xmm1; temporary results, b1c1, b1d1, b0c0,; b0d0addsubps xmm0, xmm2; b1c1+a1d1, a1c1 -b1d1, b0c0+a0d0,; a0c0-b0d0Example 6-12. Division of Two Pair of Single-precision Complex Numbers// Division of (ak + i bk ) / (ck + i dk )movshdup xmm0, Src1; load imaginary parts into the; destination, b1, b1, b0, b0movaps xmm1, src2; load the 2nd pair of complex values,; i.e. d1, c1, d0, c0mulps xmm0, xmm1; temporary results, b1d1, b1c1, b0d0,; b0c0shufps xmm1, xmm1, b1; reorder the real <strong>and</strong> imaginary; parts, c1, d1, c0, d0movsldup xmm2, Src1; load the real parts into the; destination, a1, a1, a0, a0mulps xmm2, xmm1; temp results, a1c1, a1d1, a0c0, a0d0addsubps xmm0, xmm2; a1c1+b1d1, b1c1-a1d1, a0c0+b0d0,; b0c0-a0d0mulps xmm1, xmm1 ; c1c1, d1d1, c0c0, d0d0movps xmm2, xmm1; c1c1, d1d1, c0c0, d0d0shufps xmm2, xmm2, b1; d1d1, c1c1, d0d0, c0c0addps xmm2, xmm1; c1c1+d1d1, c1c1+d1d1, c0c0+d0d0,; c0c0+d0d0divps xmm0, xmm2shufps xmm0, xmm0, b1 ; (b1c1-a1d1)/(c1c1+d1d1),; (a1c1+b1d1)/(c1c1+d1d1),; (b0c0-a0d0)/( c0c0+d0d0),; (a0c0+b0d0)/( c0c0+d0d0)In both examples, the complex numbers are store in arrays of structures.MOVSLDUP, MOVSHDUP <strong>and</strong> the asymmetric ADDSUBPS allow performing complexarithmetics on two pair of single-precision complex number simultaneously <strong>and</strong>without any unnecessary swizzling between data elements.Due to microarchitectural differences, software should implement multiplication ofcomplex double-precision numbers using SSE3 instructions on processors based on6-19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!