13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CODING FOR SIMD ARCHITECTURES4.4.3 Data Alignment for MMX TechnologyMany compilers enable alignment of variables using controls. This aligns variable bitlengths to the appropriate boundaries. If some of the variables are not appropriatelyaligned as specified, you can align them using the C algorithm in Example 4-11.Example 4-11. C Algorithm for <strong>64</strong>-bit Data Alignment/* Make newp a pointer to a <strong>64</strong>-bit aligned array of NUM_ELEMENTS <strong>64</strong>-bit elements. */double *p, *newp;p = (double*)malloc (sizeof(double)*(NUM_ELEMENTS+1));newp = (p+7) & (~0x7);The algorithm in Example 4-11 aligns an array of <strong>64</strong>-bit elements on a <strong>64</strong>-bitboundary. The constant of 7 is derived from one less than the number of bytes in a<strong>64</strong>-bit element, or 8-1. Aligning data in this manner avoids the significant performancepenalties that can occur when an access crosses a cache line boundary.Another way to improve data alignment is to copy the data into locations that arealigned on <strong>64</strong>-bit boundaries. When the data is accessed frequently, this can providea significant performance improvement.4.4.4 Data Alignment for 128-bit dataData must be 16-byte aligned when loading to <strong>and</strong> storing from the 128-bit XMMregisters used by SSE/SSE2/SSE3/SSSE3. This must be done to avoid severe performancepenalties <strong>and</strong>, at worst, execution faults.There are MOVE instructions (<strong>and</strong> intrinsics) that allow unaligned data to be copied to<strong>and</strong> out of XMM registers when not using aligned data, but such operations are muchslower than aligned accesses. If data is not 16-byte-aligned <strong>and</strong> the programmer orthe compiler does not detect this <strong>and</strong> uses the aligned instructions, a fault occurs. Sokeep data 16-byte-aligned. Such alignment also works for MMX technology code,even though MMX technology only requires 8-byte alignment.The following describes alignment techniques for Pentium 4 processor as implementedwith the Intel C++ Compiler.4.4.4.1 Compiler-Supported AlignmentThe Intel C++ Compiler provides the following methods to ensure that the data isaligned.Alignment by F<strong>32</strong>vec4 or __m128 Data TypesWhen the compiler detects F<strong>32</strong>VEC4 or __M128 data declarations or parameters, itforces alignment of the object to a 16-byte boundary for both global <strong>and</strong> local data,as well as parameters. If the declaration is within a function, the compiler also aligns4-16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!