13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONSExample 6-3. Swizzling Datatypedef struct _VERTEX_AOS {float x, y, z, color;} Vertex_aos; // AoS structure declarationtypedef struct _VERTEX_SOA {float x[4], float y[4], float z[4];float color[4];} Vertex_soa; // SoA structure declarationvoid swizzle_asm (Vertex_aos *in, Vertex_soa *out){// in mem: x1y1z1w1-x2y2z2w2-x3y3z3w3-x4y4z4w4-// SWIZZLE XYZW --> XXXXasm {mov ecx, in// get structure addressesmov edx, outmovlps xmm7, [ecx]// xmm7 = -- -- y1 x1movhps xmm7, [ecx+16] // xmm7 = y2 x2 y1 x1movlps xmm0, [ecx+<strong>32</strong>] // xmm0 = -- -- y3 x3movhps xmm0, [ecx+48] // xmm0 = y4 x4 y3 x3movaps xmm6, xmm7 // xmm6 = y1 x1 y1 x1shufps xmm7, xmm0, 0x88 // xmm7 = x1 x2 x3 x4 => Xshufps xmm6, xmm0, 0xDD // xmm6 = y1 y2 y3 y4 => Ymovlps xmm2, [ecx+8] // xmm2 = -- -- w1 z1movhps xmm2, [ecx+24] // xmm2 = w2 z2 u1 z1movlps xmm1, [ecx+40] // xmm1 = -- -- s3 z3movhps xmm1, [ecx+56] // xmm1 = w4 z4 w3 z3movaps xmm0, xmm2 // xmm0 = w1 z1 w1 z1shufps xmm2, xmm1, 0x88 // xmm2 = z1 z2 z3 z4 => Zshufps xmm0, xmm1, 0xDD // xmm0 = w1 w2 w3 w4 => Wmovaps [edx], xmm7// store Xmovaps [edx+16], xmm6 // store Ymovaps [edx+<strong>32</strong>], xmm2 // store Zmovaps [edx+48], xmm0 // store W// SWIZZLE XYZ -> XXX}}Example 6-4 shows the same data -swizzling algorithm encoded using the Intel C++Compiler’s intrinsics for SSE.6-8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!