13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MULTICORE AND HYPER-THREADING TECHNOLOGY8.6.3 Eliminate <strong>64</strong>-KByte Aliased Data AccessesThe <strong>64</strong>-KByte aliasing condition is discussed in Chapter 3. Memory accesses thatsatisfy the <strong>64</strong>-KByte aliasing condition can cause excessive evictions of the first-leveldata cache. Eliminating <strong>64</strong>-KByte aliased data accesses originating from each threadhelps improve frequency scaling in general. Furthermore, it enables the first-leveldata cache to perform efficiently when HT Technology is fully utilized by softwareapplications.User/Source Coding Rule 33. (H impact, H generality) Minimize data accesspatterns that are offset by multiples of <strong>64</strong> KBytes in each thread.The presence of <strong>64</strong>-KByte aliased data access can be detected using Pentium 4processor performance monitoring events. Appendix B includes an updated list ofPentium 4 processor performance metrics. These metrics are based on eventsaccessed using the Intel VTune Performance Analyzer.Performance penalties associated with <strong>64</strong>-KByte aliasing are applicable mainly tocurrent processor implementations of HT Technology or Intel NetBurst microarchitecture.The next section discusses memory optimization techniques that are applicableto multithreaded applications running on processors supporting HT Technology.8.6.4 Preventing Excessive Evictions in First-Level Data CacheCached data in a first-level data cache are indexed to linear addresses but physicallytagged. Data in second-level <strong>and</strong> third-level caches are tagged <strong>and</strong> indexed to physicaladdresses. While two logical processors in the same physical processor packageexecute in separate linear address space, the same processors can reference data atthe same linear address in two address spaces but mapped to different physicaladdresses. When such competing accesses occur simultaneously, they can causerepeated evictions <strong>and</strong> allocations of cache lines in the first-level data cache.Preventing unnecessary evictions in the first-level data cache by two competingthreads improves the temporal locality of the first-level data cache.Multithreaded applications need to prevent unnecessary evictions in the first-leveldata cache when:• Multiple threads within an application try to access private data on their stack,some data access patterns can cause excessive evictions of cache lines. Withinthe same software process, multiple threads have their respective stacks, <strong>and</strong>these stacks are located at different linear addresses. Frequently the linearaddresses of these stacks are spaced apart by some fixed distance that increasesthe likelihood of a cache line being used by multiple threads.• Two instances of the same application run concurrently <strong>and</strong> are executing in locksteps (for example, corresponding data in each instance are accessed more orless synchronously), accessing data on the stack (<strong>and</strong> sometimes accessing dataon the heap) by these two processes can also cause excessive evictions of cachelines because of address conflicts.8-30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!