13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING CACHE USAGEFor processors which implement non-temporal stores by updating data in-place thatalready resides in the cache hierarchy, the destination region should also be mappedas WC. Otherwise, if mapped as WB or WT, there is a potential for speculativeprocessor reads to bring the data into the caches. In such a case, non-temporalstores would then update in place <strong>and</strong> data would not be flushed from the processorby a subsequent fencing operation.The memory type visible on the bus in the presence of memory type aliasing is implementation-specific.As one example, the memory type written to the bus may reflectthe memory type for the first store to the line, as seen in program order. Other alternativesare possible. This behavior should be considered reserved <strong>and</strong> dependenceon the behavior of any particular implementation risks future incompatibility.9.5.2 Streaming Store Usage ModelsThe two primary usage domains for streaming store are coherent requests <strong>and</strong> noncoherentrequests.9.5.2.1 Coherent RequestsCoherent requests are normal loads <strong>and</strong> stores to system memory, which may alsohit cache lines present in another processor in a multiprocessor environment. Withcoherent requests, a streaming store can be used in the same way as a regular storethat has been mapped with a WC memory type (PAT or MTRR). An SFENCE instructionmust be used within a producer-consumer usage model in order to ensure coherency<strong>and</strong> visibility of data between processors.Within a single-processor system, the CPU can also re-read the same memory location<strong>and</strong> be assured of coherence (that is, a single, consistent view of this memorylocation). The same is true for a multiprocessor (MP) system, assuming an acceptedMP software producer-consumer synchronization policy is employed.9.5.2.2 Non-coherent requestsNon-coherent requests arise from an I/O device, such as an AGP graphics card, thatreads or writes system memory using non-coherent requests, which are not reflectedon the processor bus <strong>and</strong> thus will not query the processor’s caches. An SFENCEinstruction must be used within a producer-consumer usage model in order to ensurecoherency <strong>and</strong> visibility of data between processors. In this case, if the processor iswriting data to the I/O device, a streaming store can be used with a processor withany behavior of Case 1 (Section 9.5.1.3) only if the region has also been mappedwith a WC memory type (PAT, MTRR).9-9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!