13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING CACHE USAGE9.6.1 Software-Controlled PrefetchThe software-controlled prefetch is enabled using the four PREFETCH instructionsintroduced with Streaming SIMD Extensions instructions. These instructions arehints to bring a cache line of data in to various levels <strong>and</strong> modes in the cache hierarchy.The software-controlled prefetch is not intended for prefetching code. Using itcan incur significant penalties on a multiprocessor system when code is shared.Software prefetching has the following characteristics:• Can h<strong>and</strong>le irregular access patterns which do not trigger the hardwareprefetcher.• Can use less bus b<strong>and</strong>width than hardware prefetching; see below.• Software prefetches must be added to new code, <strong>and</strong> do not benefit existingapplications.9.6.2 Hardware PrefetchAutomatic hardware prefetch can bring cache lines into the unified last-level cachebased on prior data misses. It will attempt to prefetch two cache lines ahead of theprefetch stream. Characteristics of the hardware prefetcher are:• It requires some regularity in the data access patterns.— If a data access pattern has constant stride, hardware prefetching is effectiveif the access stride is less than half of the trigger distance of hardwareprefetcher (see Table 2-6).— If the access stride is not constant, the automatic hardware prefetcher canmask memory latency if the strides of two successive cache misses are lessthan the trigger threshold distance (small-stride memory traffic).— The automatic hardware prefetcher is most effective if the strides of twosuccessive cache misses remain less than the trigger threshold distance <strong>and</strong>close to <strong>64</strong> bytes.• There is a start-up penalty before the prefetcher triggers <strong>and</strong> there may befetches an array finishes. For short arrays, overhead can reduce effectiveness.— The hardware prefetcher requires a couple misses before it starts operating.— Hardware prefetching generates a request for data beyond the end of anarray, which is not be utilized. This behavior wastes bus b<strong>and</strong>width. Inaddition this behavior results in a start-up penalty when fetching thebeginning of the next array. Software prefetching may recognize <strong>and</strong> h<strong>and</strong>lethese cases.• It will not prefetch across a 4-KByte page boundary. A program has to initiatedem<strong>and</strong> loads for the new page before the hardware prefetcher startsprefetching from the new page.9-13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!