13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONSX1 X2 X3 X4XFx Fx Fx Fx+Y1 Y2 Y3 Y4XFy Fy Fy Fy+Z1 Z2 Z3 Z4XFz Fz Fz Fz+W1 W2 W3 W4XFw Fw Fw Fw=R1 R2 R3 R4OM15168Figure 6-3. Dot Product OperationFigure 6-3 shows how one result would be computed for seven instructions if thedata were organized as AoS <strong>and</strong> using SSE alone: four results would require 28instructions.Example 6-1. Pseudocode for Horizontal (xyz, AoS) Computationmulpsmovapsshufpsaddpsmovapsshufpsaddps; x*x', y*y', z*z'; reg->reg move, since next steps overwrite; get b,a,d,c from a,b,c,d; get a+b,a+b,c+d,c+d; reg->reg move; get c+d,c+d,a+b,a+b from prior addps; get a+b+c+d,a+b+c+d,a+b+c+d,a+b+c+d6-6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!