13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INTEL® <strong>64</strong> AND <strong>IA</strong>-<strong>32</strong> PROCESSOR ARCHITECTURESdata. The processor assumes that this access is part of a streaming algorithm<strong>and</strong> automatically fetches the next line.• Instruction pointer (IP)- based strided prefetcher — This prefetcher keepstrack of individual load instructions. If a load instruction is detected to have aregular stride, then a prefetch is sent to the next address which is the sum of thecurrent address <strong>and</strong> the stride. This prefetcher can prefetch forward or backward<strong>and</strong> can detect strides of up to half of a 4KB-page, or 2 KBytes.Data prefetching works on loads only when the following conditions are met:• Load is from writeback memory type.• Prefetch request is within the page boundary of 4 Kbytes.• No fence or lock is in progress in the pipeline.• Not many other load misses are in progress.• The bus is not very busy.• There is not a continuous stream of stores.DCU Prefetching has the following effects:• Improves performance if data in large structures is arranged sequentially in theorder used in the program.• May cause slight performance degradation due to b<strong>and</strong>width issues if accesspatterns are sparse instead of local.• On rare occasions, if the algorithm's working set is tuned to occupy most of thecache <strong>and</strong> unneeded prefetches evict lines required by the program, hardwareprefetcher may cause severe performance degradation due to cache capacity ofL1.In contrast to hardware prefetchers relying on hardware to anticipate data traffic,software prefetch instructions relies on the programmer to anticipate cache misstraffic, software prefetch act as hints to bring a cache line of data into the desiredlevels of the cache hierarchy. The software-controlled prefetch is intended forprefetching data, but not for prefetching code.2.1.4.3 Data Prefetch LogicData prefetch logic (DPL) prefetches data to the second-level (L2) cache based onpast request patterns of the DCU from the L2. The DPL maintains two independentarrays to store addresses from the DCU: one for upstreams (12 entries) <strong>and</strong> one fordown streams (4 entries). The DPL tracks accesses to one 4K byte page in eachentry. If an accessed page is not in any of these arrays, then an array entry is allocated.The DPL monitors DCU reads for incremental sequences of requests, known asstreams. Once the DPL detects the second access of a stream, it prefetches the nextcache line. For example, when the DCU requests the cache lines A <strong>and</strong> A+1, the DPLassumes the DCU will need cache line A+2 in the near future. If the DCU then reads2-15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!