13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING CACHE USAGE9.5 CACHEABILITY CONTROLThis section covers the mechanics of cacheability control instructions.9.5.1 The Non-temporal Store InstructionsThis section describes the behavior of streaming stores <strong>and</strong> reiterates some of theinformation presented in the previous section.In Streaming SIMD Extensions, the MOVNTPS, MOVNTPD, MOVNTQ, MOVNTDQ,MOVNTI, MASKMOVQ <strong>and</strong> MASKMOVDQU instructions are streaming, non-temporalstores. With regard to memory characteristics <strong>and</strong> ordering, they are similar to theWrite-Combining (WC) memory type:• Write combining — Successive writes to the same cache line are combined.• Write collapsing — Successive writes to the same byte(s) result in only the lastwrite being visible.• Weakly ordered — No ordering is preserved between WC stores or between WCstores <strong>and</strong> other loads or stores.• Uncacheable <strong>and</strong> not write-allocating — Stored data is written around thecache <strong>and</strong> will not generate a read-for-ownership bus request for the correspondingcache line.9.5.1.1 FencingBecause streaming stores are weakly ordered, a fencing operation is required toensure that the stored data is flushed from the processor to memory. Failure to usean appropriate fence may result in data being “trapped” within the processor <strong>and</strong> willprevent visibility of this data by other processors or system agents.WC stores require software to ensure coherence of data by performing the fencingoperation. See Section 9.5.4, “FENCE Instructions.”9.5.1.2 Streaming Non-temporal StoresStreaming stores can improve performance by:• Increasing store b<strong>and</strong>width if the <strong>64</strong> bytes that fit within a cache line are writtenconsecutively (since they do not require read-for-ownership bus requests <strong>and</strong> <strong>64</strong>bytes are combined into a single bus write transaction).• Reducing disturbance of frequently used cached (temporal) data (since theywrite around the processor caches).Streaming stores allow cross-aliasing of memory types for a given memory region.For instance, a region may be mapped as write-back (WB) using page attributetables (PAT) or memory type range registers (MTRRs) <strong>and</strong> yet is written using astreaming store.9-7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!