13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINES3.7.1 Hardware Instruction Fetching <strong>and</strong> Software PrefetchingIn processor based on Intel NetBurst microarchitecture, the hardware instructionfetcher reads instructions, <strong>32</strong> bytes at a time, into the <strong>64</strong>-byte instruction streamingbuffers. Instruction fetching for Intel Core microarchitecture is discussed inSection 2.1.2.Software prefetching requires a programmer to use PREFETCH hint instructions <strong>and</strong>anticipate some suitable timing <strong>and</strong> location of cache misses.In Intel Core microarchitecture, software PREFETCH instructions can prefetch beyondpage boundaries <strong>and</strong> can perform one-to-four page walks. Software PREFETCHinstructions issued on fill buffer allocations retire after the page walk completes <strong>and</strong>the DCU miss is detected. Software PREFETCH instructions can trigger all hardwareprefetchers in the same manner as do regular loads.Software PREFETCH operations work the same way as do load from memory operations,with the following exceptions:• Software PREFETCH instructions retire after virtual to physical addresstranslation is completed.• If an exception, such as page fault, is required to prefetch the data, then thesoftware prefetch instruction retires without prefetching data.3.7.2 Software <strong>and</strong> Hardware Prefetching in PriorMicroarchitecturesPentium 4 <strong>and</strong> Intel Xeon processors based on Intel NetBurst microarchitecture introducedhardware prefetching in addition to software prefetching. The hardwareprefetcher operates transparently to fetch data <strong>and</strong> instruction streams frommemory without requiring programmer intervention. Subsequent microarchitecturescontinue to improve <strong>and</strong> add features to the hardware prefetching mechanisms.Earlier implementations of hardware prefetching mechanisms focus on prefetchingdata <strong>and</strong> instruction from memory to L2; more recent implementations provide additionalfeatures to prefetch data from L2 to L1.In Intel NetBurst microarchitecture, the hardware prefetcher can track 8 independentstreams.The Pentium M processor also provides a hardware prefetcher for data. It can track12 separate streams in the forward direction <strong>and</strong> 4 streams in the backward direction.The processor’s PREFETCHNTA instruction also fetches <strong>64</strong>-bytes into the firstleveldata cache without polluting the second-level cache.Intel Core Solo <strong>and</strong> Intel Core Duo processors provide more advanced hardwareprefetchers for data than Pentium M processors. Key differences are summarized inTable 2-6.Although the hardware prefetcher operates transparently (requiring no interventionby the programmer), it operates most efficiently if the programmer specificallytailors data access patterns to suit its characteristics (it favors small-stride cache3-69

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!