13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONSExample 6-5. Deswizzling Single-Precision SIMD Data (Contd.)unpcklps xmm5, xmm4unpckhps xmm0, xmm4movlps [edx+8], xmm5movhps [edx+24], xmm5movlps [edx+40], xmm0movhps [edx+56], xmm0// DESWIZZLING ENDS HERE}}// xmm5= z1 w1 z2 w2// xmm0= z3 w3 z4 w4// v1 = x1 y1 z1 w1// v2 = x2 y2 z2 w2// v3 = x3 y3 z3 w3// v4 = x4 y4 z4 w4You may have to swizzle data in the registers, but not in memory. This occurs whentwo different functions need to process the data in different layout. In lighting, forexample, data comes as RRRR GGGG BBBB AAAA, <strong>and</strong> you must deswizzle them intoRGBA before converting into integers. In this case, use the MOVLHPS/MOVHLPSinstructions to do the first part of the deswizzle followed by SHUFFLE instructions,see Example 6-6 <strong>and</strong> Example 6-7.Example 6-6. Deswizzling Data Using the movlhps <strong>and</strong> shuffle Instructionsvoid deswizzle_rgb(Vertex_soa *in, Vertex_aos *out){//---deswizzle rgb---// assume: xmm1=rrrr, xmm2=gggg, xmm3=bbbb, xmm4=aaaa__asm {mov ecx, in// load structure addressesmov edx, outmovaps xmm1, [ecx]// load r4 r3 r2 r1 => xmm1movaps xmm2, [ecx+16]// load g4 g3 g2 g1 => xmm2movaps xmm3, [ecx+<strong>32</strong>]// load b4 b3 b2 b1 => xmm3movaps xmm4, [ecx+48]// load a4 a3 a2 a1 => xmm4// Start deswizzling heremovaps xmm7, xmm4// xmm7= a4 a3 a2 a1movhlps xmm7, xmm3// xmm7= a4 a3 b4 b3movaps xmm6, xmm2// xmm6= g4 g3 g2 g1movlhps xmm3, xmm4// xmm3= a2 a1 b2 b1movhlps xmm2, xmm1// xmm2= g4 g3 r4 r3movlhps xmm1, xmm6// xmm1= g2 g1 r2 r1movaps xmm6, xmm2// xmm6= g4 g3 r4 r3movaps xmm5, xmm1// xmm5= g2 g1 r2 r1shufps xmm2, xmm7, 0xDD // xmm2= a4 b4 g4 r4 =>v4shufps xmm1, xmm3, 0x88 // xmm4= a1 b1 g1 r1 =>v16-11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!