13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING CACHE USAGE9.7 MEMORY OPTIMIZATION USING NON-TEMPORALSTORESNon-temporal stores can also be used to manage data retention in the cache. Usesfor non-temporal stores include:• To combine many writes without disturbing the cache hierarchy• To manage which data structures remain in the cache <strong>and</strong> which are transientDetailed implementations of these usage models are covered in the followingsections.9.7.1 Non-temporal Stores <strong>and</strong> Software Write-CombiningUse non-temporal stores in the cases when the data to be stored is:• Write-once (non-temporal)• Too large <strong>and</strong> thus cause cache thrashingNon-temporal stores do not invoke a cache line allocation, which means they are notwrite-allocate. As a result, caches are not polluted <strong>and</strong> no dirty writeback is generatedto compete with useful data b<strong>and</strong>width. Without using non-temporal stores, busb<strong>and</strong>width will suffer when caches start to be thrashed because of dirty writebacks.In Streaming SIMD Extensions implementation, when non-temporal stores arewritten into writeback or write-combining memory regions, these stores are weaklyordered<strong>and</strong> will be combined internally inside the processor’s write-combining buffer<strong>and</strong> be written out to memory as a line burst transaction. To achieve the best possibleperformance, it is recommended to align data along the cache line boundary <strong>and</strong>write them consecutively in a cache line size while using non-temporal stores. If theconsecutive writes are prohibitive due to programming constraints, then softwarewrite-combining (SWWC) buffers can be used to enable line burst transaction.You can declare small SWWC buffers (a cache line for each buffer) in your applicationto enable explicit write-combining operations. Instead of writing to non-temporalmemory space immediately, the program writes data into SWWC buffers <strong>and</strong>combines them inside these buffers. The program only writes a SWWC buffer outusing non-temporal stores when the buffer is filled up, that is, a cache line (128 bytesfor the Pentium 4 processor). Although the SWWC method requires explicit instructionsfor performing temporary writes <strong>and</strong> reads, this ensures that the transaction onthe front-side bus causes line transaction rather than several partial transactions.Application performance gains considerably from implementing this technique. TheseSWWC buffers can be maintained in the second-level <strong>and</strong> re-used throughout theprogram.9-30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!