13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONS6.6.1 SIMD Floating-point Programming Using SSE3SSE3 enhances SSE <strong>and</strong> SSE2 with nine instructions targeted for SIMD floating-pointprogramming. In contrast to many SSE/SSE2 instructions offering homogeneousarithmetic operations on parallel data elements <strong>and</strong> favoring the vertical computationmodel, SSE3 offers instructions that performs asymmetric arithmetic operation <strong>and</strong>arithmetic operation on horizontal data elements.ADDSUBPS <strong>and</strong> ADDSUBPD are two instructions with asymmetric arithmeticprocessing capability (see Figure 6-5). HADDPS, HADDPD, HSUBPS <strong>and</strong> HSUBPDoffers horizontal arithmetic processing capability (see Figure 6-6). In addition:MOVSLDUP, MOVSHDUP <strong>and</strong> MOVDDUP load data from memory (or XMM register)<strong>and</strong> replicate data elements at once.X1X0Y1Y0ADDSUBX1 + Y1 X0 -Y0Figure 6-5. Asymmetric Arithmetic Operation of the SSE3 Instruction6-17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!