13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONS• It may become difficult to hide long latency operations. For instance, anothercommon function in 3D graphics is normalization, which requires thecomputation of a reciprocal square root (that is, 1/sqrt). Both the division <strong>and</strong>square root are long latency operations. With vertical computation (SoA), each ofthe 4 computation slots in a SIMD operation is producing a unique result, so thenet latency per slot is L/4 where L is the overall latency of the operation.However, for horizontal computation, the four computation slots each producethe same result, hence to produce four separate results requires a net latency perslot of L.To utilize all four computation slots, the vertex data can be reorganized to allowcomputation on each component of four separate vertices, that is, processingmultiple vectors simultaneously. This can also be referred to as an SoA form of representingvertices data shown in Table 6-1.Table 6-1. SoA Form of Representing Vertices DataVx array X1 X2 X3 X4 ..... XnVy array Y1 Y2 Y3 Y4 ..... YnVz array Z1 Z2 Z3 Y4 ..... ZnVw array W1 W2 W3 W4 ..... WnOrganizing data in this manner yields a unique result for each computational slot foreach arithmetic operation.Vertical computation takes advantage of the inherent parallelism in 3D geometryprocessing of vertices. It assigns the computation of four vertices to the fourcompute slots of the Pentium III processor, thereby eliminating the disadvantages ofthe horizontal approach described earlier (using SSE alone). The dot product operationimplements the SoA representation of vertices data. A schematic representationof dot product operation is shown in Figure 6-3.6-5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!