13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINES3.6.3 AlignmentAlignment of data concerns all kinds of variables:• Dynamically allocated variables• Members of a data structure• Global or local variables• Parameters passed on the stackMisaligned data access can incur significant performance penalties. This is particularlytrue for cache line splits. The size of a cache line is <strong>64</strong> bytes in the Pentium 4 <strong>and</strong>other recent Intel processors, including processors based on Intel Core microarchitecture.An access to data unaligned on <strong>64</strong>-byte boundary leads to two memory accesses <strong>and</strong>requires several µops to be executed (instead of one). Accesses that span <strong>64</strong>-byteboundaries are likely to incur a large performance penalty, the cost of each stallgenerally are greater on machines with longer pipelines.Double-precision floating-point oper<strong>and</strong>s that are eight-byte aligned have betterperformance than oper<strong>and</strong>s that are not eight-byte aligned, since they are less likelyto incur penalties for cache <strong>and</strong> MOB splits. Floating-point operation on a memoryoper<strong>and</strong>s require that the oper<strong>and</strong> be loaded from memory. This incurs an additionalµop, which can have a minor negative impact on front end b<strong>and</strong>width. Additionally,memory oper<strong>and</strong>s may cause a data cache miss, causing a penalty.Assembly/Compiler Coding Rule 45. (H impact, H generality) Align data onnatural oper<strong>and</strong> size address boundaries. If the data will be accessed with vectorinstruction loads <strong>and</strong> stores, align the data on 16-byte boundaries.For best performance, align data as follows:• Align 8-bit data at any address.• Align 16-bit data to be contained within an aligned 4-byte word.• Align <strong>32</strong>-bit data so that its base address is a multiple of four.• Align <strong>64</strong>-bit data so that its base address is a multiple of eight.• Align 80-bit data so that its base address is a multiple of sixteen.• Align 128-bit data so that its base address is a multiple of sixteen.A <strong>64</strong>-byte or greater data structure or array should be aligned so that its baseaddress is a multiple of <strong>64</strong>. Sorting data in decreasing size order is one heuristic forassisting with natural alignment. As long as 16-byte boundaries (<strong>and</strong> cache lines) arenever crossed, natural alignment is not strictly necessary (though it is an easy way toenforce this).Example 3-28 shows the type of code that can cause a cache line split. The codeloads the addresses of two DWORD arrays. 029E70FEH is not a 4-byte-alignedaddress, so a 4-byte access at this address will get 2 bytes from the cache line thisaddress is contained in, <strong>and</strong> 2 bytes from the cache line that starts at 029E700H. Onprocessors with <strong>64</strong>-byte cache lines, a similar cache line split will occur every 8 iterations.3-48

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!