13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CODING FOR SIMD ARCHITECTURESExample 4-14. AoS <strong>and</strong> SoA Code Samples (Contd.); In the AOS model, the vertices are stored in the xyz formatmovaps xmm0, Array ; xmm0 = DC, x0, y0, z0movaps xmm1, Fixed ; xmm1 = DC, xF, yF, zFmulps xmm0, xmm1 ; xmm0 = DC, x0*xF, y0*yF, z0*zFmovhlps xmm, xmm0 ; xmm = DC, DC, DC, x0*xFaddps xmm1, xmm0 ; xmm0 = DC, DC, DC,; x0*xF+z0*zFmovaps xmm2, xmm1shufps xmm2, xmm2,55h ; xmm2 = DC, DC, DC, y0*yFaddps xmm2, xmm1 ; xmm1 = DC, DC, DC,; x0*xF+y0*yF+z0*zF; SoA code; X = x0,x1,x2,x3; Y = y0,y1,y2,y3; Z = z0,z1,z2,z3; A = xF,xF,xF,xF; B = yF,yF,yF,yF; C = zF,zF,zF,zFmovaps xmm0, X ; xmm0 = x0,x1,x2,x3movaps xmm1, Y ; xmm0 = y0,y1,y2,y3movaps xmm2, Z ; xmm0 = z0,z1,z2,z3mulps xmm0, A ; xmm0 = x0*xF, x1*xF, x2*xF, x3*xFmulps xmm1, B ; xmm1 = y0*yF, y1*yF, y2*yF, y3*xFmulps xmm2, C ; xmm2 = z0*zF, z1*zF, z2*zF, z3*zFaddps xmm0, xmm1addps xmm0, xmm2 ; xmm0 = (x0*xF+y0*yF+z0*zF), ...Performing SIMD operations on the original AoS format can require more calculations<strong>and</strong> some operations do not take advantage of all SIMD elements available. Therefore,this option is generally less efficient.The recommended way for computing data in AoS format is to swizzle each set ofelements to SoA format before processing it using SIMD technologies. Swizzling caneither be done dynamically during program execution or statically when the datastructures are generated. See Chapter 5 <strong>and</strong> Chapter 6 for examples. Performing theswizzle dynamically is usually better than using AoS, but can be somewhat inefficientbecause there are extra instructions during computation. Performing the swizzlestatically, when data structures are being laid out, is best as there is no runtime overhead.As mentioned earlier, the SoA arrangement allows more efficient use of the parallelismof SIMD technologies because the data is ready for computation in a moreoptimal vertical manner: multiplying components X0,X1,X2,X3 by XF,XF,XF,XF using4-20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!