13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONSfor the graphic cards to process, use either SSE or MMX technology code. Using MMXinstructions allow you to conserve XMM registers for other computational tasks.Example 6-8 illustrates how to use MMX technology code for copying or shuffling.Example 6-8. Using MMX Technology Code for Copying or Shufflingmovq mm0, [Uarray+ebx] ; mm0= u1 u2movq mm1, [Varray+ebx] ; mm1= v1 v2movq mm2, mm0 ; mm2= u1 u2punpckhdq mm0, mm1 ; mm0= u1 v1punpckldq mm2, mm1 ; mm2= u2 v2movq [Coords+edx], mm0 ; store u1 v1movq [Coords+8+edx], mm2 ; store u2 v2movq mm4, [Uarray+8+ebx] ; mm4= u3 u4movq mm5, [Varray+8+ebx] ; mm5= v3 v4movq mm6, mm4 ; mm6= u3 u4punpckhdq mm4, mm5 ; mm4= u3 v3punpckldq mm6, mm5 ; mm6= u4 v4movq [Coords+16+edx], mm4 ; store u3 v3movq [Coords+24+edx], mm6 ; store u4 v46.5.1.5 Horizontal ADD Using SSEAlthough vertical computations generally make use of SIMD performance better thanhorizontal computations, in some cases, code must use a horizontal operation.MOVLHPS/MOVHLPS <strong>and</strong> shuffle can be used to sum data horizontally. For example,starting with four 128-bit registers, to sum up each register horizontally while havingthe final results in one register, use the MOVLHPS/MOVHLPS to align the upper <strong>and</strong>lower parts of each register. This allows you to use a vertical add. With the resultingpartial horizontal summation, full summation follows easily.Figure 6-4 presents a horizontal add using MOVHLPS/MOVLHPS. Example 6-9 <strong>and</strong>Example 6-10 provide the code for this operation.6-13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!