13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINES— On Pentium 4 <strong>and</strong> Intel Xeon processors with a CPUID signature of familyencoding 15, model encoding 3; there will be an excess of first-level cachemisses for more than 8 simultaneous competing references to addresses thatare apart by 2-KByte modulus.— On Intel Core 2 Duo, Intel Core Duo, Intel Core Solo, <strong>and</strong> Pentium Mprocessors, there will be an excess of first-level cache misses for more than 8simultaneous references to addresses that are apart by 4-KByte modulus.• L2 Set Conflicts — Multiple references map to the same second-level cache set.The conflicting condition is also determined by the size of the cache or thenumber of ways:— On Pentium 4 <strong>and</strong> Intel Xeon processors, there will be an excess of secondlevelcache misses for more than 8 simultaneous competing references. Thestride sizes that can cause capacity issues are <strong>32</strong> KBytes, <strong>64</strong> KBytes, or128 KBytes, depending of the size of the second level cache.— On Pentium M processors, the stride sizes that can cause capacity issues are128 KBytes or 256 KBytes, depending of the size of the second level cache.On Intel Core 2 Duo, Intel Core Duo, Intel Core Solo processors, stride size of256 KBytes can cause capacity issue if the number of simultaneous accessesexceeded the way associativity of the L2 cache.3.6.7.2 Aliasing Cases in Processors Based on Intel NetBurstMicroarchitectureAliasing conditions that are specific to processors based on Intel NetBurst microarchitectureare:• 16 KBytes for code — There can only be one of these in the trace cache at atime. If two traces whose starting addresses are 16 KBytes apart are in the sameworking set, the symptom will be a high trace cache miss rate. Solve this byoffsetting one of the addresses by one or more bytes.• Data conflict — There can only be one instance of the data in the first-levelcache at a time. If a reference (load or store) occurs <strong>and</strong> its linear addressmatches a data conflict condition with another reference (load or store) that isunder way, then the second reference cannot begin until the first one is kickedout of the cache.— On Pentium 4 <strong>and</strong> Intel Xeon processors with a CPUID signature of familyencoding 15, model encoding of 0, 1, or 2; the data conflict condition appliesto addresses having identical values in bits 15:6 (this is also referred to as a“<strong>64</strong>-KByte aliasing conflict”). If you avoid this kind of aliasing, you can speedup programs by a factor of three if they load frequently from preceding storeswith aliased addresses <strong>and</strong> little other instruction-level parallelism isavailable. The gain is smaller when loads alias with other loads, which causesthrashing in the first-level cache.3-61

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!