13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESExample 3-37. Dynamic Stack Alignment (Contd.)epilogue:; ... callee restores, etc.movl esp, [ebp] ; Restore stack ptrmovl ebp, [esp] ; Restore frame ptraddl esp, 4retIf for some reason it is not possible to align the stack for <strong>64</strong>-bits, the routine shouldaccess the parameter <strong>and</strong> save it into a register or known aligned storage, thus incurringthe penalty only once.3.6.7 Capacity Limits <strong>and</strong> Aliasing in CachesThere are cases in which addresses with a given stride will compete for someresource in the memory hierarchy.Typically, caches are implemented to have multiple ways of set associativity, witheach way consisting of multiple sets of cache lines (or sectors in some cases).Multiple memory references that compete for the same set of each way in a cachecan cause a capacity issue. There are aliasing conditions that apply to specificmicroarchitectures. Note that first-level cache lines are <strong>64</strong> bytes. Thus, the leastsignificant 6 bits are not considered in alias comparisons. For processors based onIntel NetBurst microarchitecture, data is loaded into the second level cache in asector of 128 bytes, so the least significant 7 bits are not considered in alias comparisons.3.6.7.1 Capacity Limits in Set-Associative CachesCapacity limits may be reached if the number of outst<strong>and</strong>ing memory references thatare mapped to the same set in each way of a given cache exceeds the number ofways of that cache. The conditions that apply to the first-level data cache <strong>and</strong> secondlevel cache are listed below:• L1 Set Conflicts — Multiple references map to the same first-level cache set.The conflicting condition is a stride determined by the size of the cache in bytes,divided by the number of ways. These competing memory references can causeexcessive cache misses only if the number of outst<strong>and</strong>ing memory referencesexceeds the number of ways in the working set:— On Pentium 4 <strong>and</strong> Intel Xeon processors with a CPUID signature of familyencoding 15, model encoding of 0, 1, or 2; there will be an excess of firstlevelcache misses for more than 4 simultaneous competing memoryreferences to addresses with 2-KByte modulus.3-60

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!