13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MULTICORE AND HYPER-THREADING TECHNOLOGY8.5.3 Avoid Excessive Software PrefetchesPentium 4 <strong>and</strong> Intel Xeon Processors have an automatic hardware prefetcher. It canbring data <strong>and</strong> instructions into the unified second-level cache based on prior referencepatterns. In most situations, the hardware prefetcher is likely to reduce systemmemory latency without explicit intervention from software prefetches. It is alsopreferable to adjust data access patterns in the code to take advantage of the characteristicsof the automatic hardware prefetcher to improve locality or mask memorylatency. Processors based on Intel Core microarchitecture also provides severaladvanced hardware prefetching mechanisms. Data access patterns that can takeadvantage of earlier generations of hardware prefetch mechanism generally can takeadvantage of more recent hardware prefetch implementations.Using software prefetch instructions excessively or indiscriminately will inevitablycause performance penalties. This is because excessively or indiscriminately usingsoftware prefetch instructions wastes the comm<strong>and</strong> <strong>and</strong> data b<strong>and</strong>width of thesystem bus.Using software prefetches delays the hardware prefetcher from starting to fetch dataneeded by the processor core. It also consumes critical execution resources <strong>and</strong> canresult in stalled execution. In some cases, it may be fruitful to evaluate the reductionor removal of software prefetches to migrate towards more effective use of hardwareprefetch mechanisms. The guidelines for using software prefetch instructions aredescribed in Chapter 3. The techniques for using automatic hardware prefetcher isdiscussed in Chapter 9.User/Source Coding Rule 27. (M impact, L generality) Avoid excessive use ofsoftware prefetch instructions <strong>and</strong> allow automatic hardware prefetcher to work.Excessive use of software prefetches can significantly <strong>and</strong> unnecessarily increasebus utilization if used inappropriately.8.5.4 Improve Effective Latency of Cache MissesSystem memory access latency due to cache misses is affected by bus traffic. This isbecause bus read requests must be arbitrated along with other requests for bustransactions. Reducing the number of outst<strong>and</strong>ing bus transactions helps improveeffective memory access latency.One technique to improve effective latency of memory read transactions is to usemultiple overlapping bus reads to reduce the latency of sparse reads. In situationswhere there is little locality of data or when memory reads need to be arbitrated withother bus transactions, the effective latency of scattered memory reads can beimproved by issuing multiple memory reads back-to-back to overlap multipleoutst<strong>and</strong>ing memory read transactions. The average latency of back-to-back busreads is likely to be lower than the average latency of scattered reads interspersedwith other bus transactions. This is because only the first memory read needs to waitfor the full delay of a cache miss.8-25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!