13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 9OPTIMIZING CACHE USAGEOver the past decade, processor speed has increased. Memory access speed hasincreased at a slower pace. The resulting disparity has made it important to tuneapplications in one of two ways: either (a) a majority of data accesses are fulfilledfrom processor caches, or (b) effectively masking memory latency to utilize peakmemory b<strong>and</strong>width as much as possible.Hardware prefetching mechanisms are enhancements in microarchitecture to facilitatethe latter aspect, <strong>and</strong> will be most effective when combined with softwaretuning. The performance of most applications can be considerably improved if thedata required can be fetched from the processor caches or if memory traffic can takeadvantage of hardware prefetching effectively.St<strong>and</strong>ard techniques to bring data into the processor before it is needed involve additionalprogramming which can be difficult to implement <strong>and</strong> may require specialsteps to prevent performance degradation. Streaming SIMD Extensions addressedthis issue by providing various prefetch instructions.Streaming SIMD Extensions introduced the various non-temporal store instructions.SSE2 extends this support to new data types <strong>and</strong> also introduce non-temporal storesupport for the <strong>32</strong>-bit integer registers.This chapter focuses on:• Hardware Prefetch Mechanism, Software Prefetch <strong>and</strong> Cacheability Instructions— Discusses microarchitectural feature <strong>and</strong> instructions that allow you to affectdata caching in an application.• Memory <strong>Optimization</strong> Using Hardware Prefetching, Software Prefetch <strong>and</strong> CacheabilityInstructions — Discusses techniques for implementing memory optimizationsusing the above instructions.NOTEIn a number of cases presented, the prefetching <strong>and</strong> cache utilizationdescribed are specific to current implementations of Intel NetBurstmicroarchitecture but are largely applicable for the future processors.• Using deterministic cache parameters to manage cache hierarchy.9.1 GENERAL PREFETCH CODING GUIDELINESThe following guidelines will help you to reduce memory traffic <strong>and</strong> utilize peakmemory system b<strong>and</strong>width more effectively when large amounts of data movementmust originate from the memory system:9-1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!