13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MULTICORE AND HYPER-THREADING TECHNOLOGY8.4.5 Prevent Sharing of Modified Data <strong>and</strong> False-SharingOn an Intel Core Duo processor or a processor based on Intel Core microarchitecture,sharing of modified data incurs a performance penalty when a thread running on onecore tries to read or write data that is currently present in modified state in the firstlevel cache of the other core. This will cause eviction of the modified cache line backinto memory <strong>and</strong> reading it into the first-level cache of the other core. The latency ofsuch cache line transfer is much higher than using data in the immediate first levelcache or second level cache.False sharing applies to data used by one thread that happens to reside on the samecache line as different data used by another thread. These situations can also incurperformance delay depending on the topology of the logical processors/cores in theplatform.An example of false sharing of multithreading environment using processors basedon Intel NetBurst Microarchitecture is when thread-private data <strong>and</strong> a threadsynchronization variable are located within the line size boundary (<strong>64</strong> bytes) orsector boundary (128 bytes). When one thread modifies the synchronization variable,the “dirty” cache line must be written out to memory <strong>and</strong> updated for eachphysical processor sharing the bus. Subsequently, data is fetched into each targetprocessor 128 bytes at a time, causing previously cached data to be evicted from itscache on each target processor.False sharing can experience performance penalty when the threads are running onlogical processors reside on different physical processors. For processors thatsupport HT Technology, false-sharing incurs a performance penalty when two threadsrun on different cores, different physical processors, or on two logical processors inthe physical processor package. In the first two cases, the performance penalty isdue to cache evictions to maintain cache coherency. In the latter case, performancepenalty is due to memory order machine clear conditions.False sharing is not expected to have a performance impact with a single Intel CoreDuo processor.User/Source Coding Rule 23. (H impact, M generality) Beware of false sharingwithin a cache line (<strong>64</strong> bytes on Intel Pentium 4, Intel Xeon, Pentium M, Intel CoreDuo processors), <strong>and</strong> within a sector (128 bytes on Pentium 4 <strong>and</strong> Intel Xeonprocessors).When a common block of parameters is passed from a parent thread to severalworker threads, it is desirable for each work thread to create a private copy offrequently accessed data in the parameter block.8.4.6 Placement of Shared Synchronization VariableOn processors based on Intel NetBurst microarchitecture, bus reads typically fetch128 bytes into a cache, the optimal spacing to minimize eviction of cached data is128 bytes. To prevent false-sharing, synchronization variables <strong>and</strong> system objects8-21

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!