13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING CACHE USAGEThe non-temporal instruction is:• PREFETCHNTA— Fetch the data into the second-level cache, minimizing cachepollution.Temporal instructions are:• PREFETCHNT0 — Fetch the data into all cache levels; that is, to the second-levelcache for the Pentium 4 processor.• PREFETCHNT1 — This instruction is identical to PREFETCHT0.• PREFETCHNT2 — This instruction is identical to PREFETCHT0.9.4.3 Prefetch <strong>and</strong> Load InstructionsThe Pentium 4 processor has a decoupled execution <strong>and</strong> memory architecture thatallows instructions to be executed independently with memory accesses (if there areno data <strong>and</strong> resource dependencies). Programs or compilers can use dummy loadinstructions to imitate PREFETCH functionality; but preloading is not completelyequivalent to using PREFETCH instructions. PREFETCH provides greater performancethan preloading.Currently, PREFETCH provides greater performance than preloading because:• Has no destination register, it only updates cache lines.• Does not stall the normal instruction retirement.• Does not affect the functional behavior of the program.• Has no cache line split accesses.• Does not cause exceptions except when the LOCK prefix is used. The LOCK prefixis not a valid prefix for use with PREFETCH.• Does not complete its own execution if that would cause a fault.Currently, the advantage of PREFETCH over preloading instructions are processorspecific.This may change in the future.There are cases where a PREFETCH will not perform the data prefetch. These include:• PREFETCH causes a DTLB (Data Translation Lookaside Buffer) miss. This appliesto Pentium 4 processors with CPUID signature corresponding to family 15, model0, 1, or 2. PREFETCH resolves DTLB misses <strong>and</strong> fetches data on Pentium 4processors with CPUID signature corresponding to family 15, model 3.• An access to the specified address that causes a fault/exception.• If the memory subsystem runs out of request buffers between the first-level cache<strong>and</strong> the second-level cache.• PREFETCH targets an uncacheable memory region (for example, USWC <strong>and</strong> UC).• The LOCK prefix is used. This causes an invalid opcode exception.9-6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!