13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CODING FOR SIMD ARCHITECTURES4 SIMD execution slots to produce 4 unique results. In contrast, computing directlyon AoS data can lead to horizontal operations that consume SIMD execution slots butproduce only a single scalar result (as shown by the many “don’t-care” (DC) slots inExample 4-14).Use of the SoA format for data structures can lead to more efficient use of caches <strong>and</strong>b<strong>and</strong>width. When the elements of the structure are not accessed with equalfrequency, such as when element x, y, z are accessed ten times more often than theother entries, then SoA saves memory <strong>and</strong> prevents fetching unnecessary data itemsa, b, <strong>and</strong> c.Example 4-15. Hybrid SoA Data StructureNumOfGroups = NumOfVertices/SIMDwidthtypedef struct{float x[SIMDwidth];float y[SIMDwidth];float z[SIMDwidth];} VerticesCoordList;typedef struct{int a[SIMDwidth];int b[SIMDwidth];int c[SIMDwidth];. . .} VerticesColorList;VerticesCoordList VerticesCoord[NumOfGroups];VerticesColorList VerticesColor[NumOfGroups];Note that SoA can have the disadvantage of requiring more independent memorystream references. A computation that uses arrays X, Y, <strong>and</strong> Z (see Example 4-13)would require three separate data streams. This can require the use of moreprefetches, additional address generation calculations, as well as having a greaterimpact on DRAM page access efficiency.There is an alternative: a hybrid SoA approach blends the two alternatives (seeExample 4-15). In this case, only 2 separate address streams are generated <strong>and</strong>referenced: one contains XXXX, YYYY, ZZZZ, ZZZZ,... <strong>and</strong> the other AAAA, BBBB,CCCC, AAAA, DDDD,... . The approach prevents fetching unnecessary data,assuming the variables X, Y, Z are always used together; whereas the variables A, B,C would also be used together, but not at the same time as X, Y, Z.The hybrid SoA approach ensures:• Data is organized to enable more efficient vertical SIMD computation• Simpler/less address generation than AoS• Fewer streams, which reduces DRAM page misses4-21

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!