13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONSX1X0Y1Y0ADDADDY0 + Y1X0 + X1Figure 6-6. Horizontal Arithmetic Operation of the SSE3 Instruction HADDPD6.6.1.1 SSE3 <strong>and</strong> Complex ArithmeticsThe flexibility of SSE3 in dealing with AOS-type of data structure can be demonstratedby the example of multiplication <strong>and</strong> division of complex numbers. Forexample, a complex number can be stored in a structure consisting of its real <strong>and</strong>imaginary part. This naturally leads to the use of an array of structure. Example 6-11demonstrates using SSE3 instructions to perform multiplications of single-precisioncomplex numbers. Example 6-12 demonstrates using SSE3 instructions to performdivision of complex numbers.Example 6-11. Multiplication of Two Pair of Single-precision Complex Number// Multiplication of (ak + i bk ) * (ck + i dk )// a + i b can be stored as a data structuremovsldup xmm0, Src1; load real parts into the destination,; a1, a1, a0, a0movaps xmm1, src2; load the 2nd pair of complex values,; i.e. d1, c1, d0, c0mulps xmm0, xmm1; temporary results, a1d1, a1c1, a0d0,; a0c0shufps xmm1, xmm1, b1; reorder the real <strong>and</strong> imaginary; parts, c1, d1, c0, d0movshdup xmm2, Src1; load the imaginary parts into the; destination, b1, b1, b0, b06-18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!