13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONSExample 6-9. Horizontal Add Using MOVHLPS/MOVLHPS (Contd.)// START HORIZONTAL ADDmovaps xmm5, xmm0movlhps xmm5, xmm1movhlps xmm1, xmm0addps xmm5, xmm1movaps xmm4, xmm2movlhps xmm2, xmm3movhlps xmm3, xmm4addps xmm3, xmm2movaps xmm6, xmm3shufps xmm3, xmm5, 0xDDshufps xmm5, xmm6, 0x88addps xmm6, xmm5// END HORIZONTAL ADDmovaps [edx], xmm6}}// xmm5= A1,A2,A3,A4// xmm5= A1,A2,B1,B2// xmm1= A3,A4,B3,B4// xmm5= A1+A3,A2+A4,B1+B3,B2+B4// xmm2= C1,C2,D1,D2// xmm3= C3,C4,D3,D4// xmm3= C1+C3,C2+C4,D1+D3,D2+D4// xmm6= C1+C3,C2+C4,D1+D3,D2+D4//xmm6=A1+A3,B1+B3,C1+C3,D1+D3// xmm5= A2+A4,B2+B4,C2+C4,D2+D4// xmm6= D,C,B,AExample 6-10. Horizontal Add Using Intrinsics with MOVHLPS/MOVLHPSvoid horiz_add_intrin(Vertex_soa *in, float *out){__m128 v, v2, v3, v4;__m128 tmm0,tmm1,tmm2,tmm3,tmm4,tmm5,tmm6;// Temporary variablestmm0 = _mm_load_ps(in->x); // tmm0 = A1 A2 A3 A4tmm1 = _mm_load_ps(in->y); // tmm1 = B1 B2 B3 B4tmm2 = _mm_load_ps(in->z); // tmm2 = C1 C2 C3 C4tmm3 = _mm_load_ps(in->w); // tmm3 = D1 D2 D3 D4tmm5 = tmm0; // tmm0 = A1 A2 A3 A4tmm5 = _mm_movelh_ps(tmm5, tmm1); // tmm5 = A1 A2 B1 B2tmm1 = _mm_movehl_ps(tmm1, tmm0); // tmm1 = A3 A4 B3 B4tmm5 = _mm_add_ps(tmm5, tmm1); // tmm5 = A1+A3 A2+A4 B1+B3 B2+B4tmm4 = tmm2;6-15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!