13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING CACHE USAGEPREFETCH loads either non-temporal data or temporal data in the specified cachelevel. This data access type <strong>and</strong> the cache level are specified as a hint. Depending onthe implementation, the instruction fetches <strong>32</strong> or more aligned bytes (including thespecified address byte) into the instruction-specified cache levels.PREFETCH is implementation-specific; applications need to be tuned to each implementationto maximize performance.NOTEUsing the PREFETCH instruction is recommended only if data doesnot fit in cache.PREFETCH provides a hint to the hardware; it does not generate exceptions or faultsexcept for a few special cases (see Section 9.4.3, “Prefetch <strong>and</strong> Load Instructions”).However, excessive use of PREFETCH instructions may waste memory b<strong>and</strong>width <strong>and</strong>result in a performance penalty due to resource constraints.Nevertheless, PREFETCH can lessen the overhead of memory transactions bypreventing cache pollution <strong>and</strong> by using caches <strong>and</strong> memory efficiently. This is particularlyimportant for applications that share critical system resources, such as thememory bus. See an example in Section 9.7.2.1, “Video Encoder.”PREFETCH is mainly designed to improve application performance by hiding memorylatency in the background. If segments of an application access data in a predictablemanner (for example, using arrays with known strides), they are good c<strong>and</strong>idates forusing PREFETCH to improve performance.Use the PREFETCH instructions in:• Predictable memory access patterns• Time-consuming innermost loops• Locations where the execution pipeline may stall if data is not available9.4.2 Prefetch Instructions – Pentium ® 4 ProcessorImplementationStreaming SIMD Extensions include four PREFETCH instructions variants, one nontemporal<strong>and</strong> three temporal. They correspond to two types of operations, temporal<strong>and</strong> non-temporal.NOTEAt the time of PREFETCH, if data is already found in a cache level thatis closer to the processor than the cache level specified by theinstruction, no data movement occurs.9-5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!