13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESconcurrent streams. See Chapter 9, “Optimizing Cache Usage,” for more informationon the hardware prefetcher.On Intel Core 2 Duo, Intel Core Duo, Intel Core Solo, Pentium 4, Intel Xeon <strong>and</strong>Pentium M processors, memory coherence is maintained on <strong>64</strong>-byte cache lines(rather than <strong>32</strong>-byte cache lines. as in earlier processors). This can increase theopportunity for false sharing.User/Source Coding Rule 7. (M impact, L generality) Beware of false sharingwithin a cache line (<strong>64</strong> bytes) <strong>and</strong> within a sector of 128 bytes on processors basedon Intel NetBurst microarchitecture.3.6.6 Stack AlignmentThe easiest way to avoid stack alignment problems is to keep the stack aligned at alltimes. For example, a language that supports 8-bit, 16-bit, <strong>32</strong>-bit, <strong>and</strong> <strong>64</strong>-bit dataquantities but never uses 80-bit data quantities can require the stack to always bealigned on a <strong>64</strong>-bit boundary.Assembly/Compiler Coding Rule 54. (H impact, M generality) If <strong>64</strong>-bit data isever passed as a parameter or allocated on the stack, make sure that the stack isaligned to an 8-byte boundary.Doing this will require using a general purpose register (such as EBP) as a framepointer. The trade-off is between causing unaligned <strong>64</strong>-bit references (if the stack isnot aligned) <strong>and</strong> causing extra general purpose register spills (if the stack is aligned).Note that a performance penalty is caused only when an unaligned access splits acache line. This means that one out of eight spatially consecutive unaligned accessesis always penalized.A routine that makes frequent use of <strong>64</strong>-bit data can avoid stack misalignment byplacing the code described in Example 3-37 in the function prologue <strong>and</strong> epilogue.Example 3-37. Dynamic Stack Alignmentprologue:subl esp, 4 ; Save frame ptrmovl [esp], ebpmovl ebp, esp ; New frame pointer<strong>and</strong>l ebp, 0xFFFFFFFC ; Aligned to <strong>64</strong> bitsmovl [ebp], esp ; Save old stack ptrsubl esp, FRAMESIZE ; Allocate space; ... callee saves, etc.3-59

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!