13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CODING FOR SIMD ARCHITECTURESIf the data in such a CLASS is going to be used with the Streaming SIMD Extensionsor Streaming SIMD Extensions 2, it is preferable to use a UNION to make this explicit.In C++, an anonymous UNION can be used to make this more convenient:class my_m128 {union {__m128 m;float f[4];};};Because the UNION is anonymous, the names, M <strong>and</strong> F, can be used as immediatemember names of MY__M128. Note that __DECLSPEC(ALIGN) has no effect whenapplied to a CLASS, STRUCT, or UNION member in either C or C++.Alignment by Using __m<strong>64</strong> or DOUBLE DataIn some cases, the compiler aligns routines with __M<strong>64</strong> or DOUBLE data to 16-bytesby default. The comm<strong>and</strong>-line switch, -QSFALIGN16, limits the compiler so that itonly performs this alignment on routines that contain 128-bit data. The defaultbehavior is to use -QSFALIGN8. This switch instructs the complier to align routineswith 8- or 16-byte data types to 16 bytes.For more, see the Intel C++ Compiler documentation.4.5 IMPROVING MEMORY UTILIZATIONMemory performance can be improved by rearranging data <strong>and</strong> algorithms for SSE,SSE2, <strong>and</strong> MMX technology intrinsics. Methods for improving memory performanceinvolve working with the following:• Data structure layout• Strip-mining for vectorization <strong>and</strong> memory utilization• Loop-blockingUsing the cacheability instructions, prefetch <strong>and</strong> streaming store, also greatlyenhance memory utilization. See also: Chapter 9, “Optimizing Cache Usage.”4.5.1 Data Structure LayoutFor certain algorithms, like 3D transformations <strong>and</strong> lighting, there are two basic waysto arrange vertex data. The traditional method is the array of structures (AoS)arrangement, with a structure for each vertex (Example 4-12). However this methoddoes not take full advantage of SIMD technology capabilities.4-18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!