13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

MULTICORE AND HYPER-THREADING TECHNOLOGYbe issued at a rate that allows immediate detection of any store to the synchronizationvariable. This prevents the occurrence of a long delay due to memory orderviolation.One example of inserting the PAUSE instruction in a simplified spin-wait loop isshown in Example 8-4(b). The PAUSE instruction is compatible with all Intel <strong>64</strong> <strong>and</strong><strong>IA</strong>-<strong>32</strong> processors. On <strong>IA</strong>-<strong>32</strong> processors prior to Intel NetBurst microarchitecture, thePAUSE instruction is essentially a NOP instruction. Additional examples of optimizingspin-wait loops using the PAUSE instruction are available in Application note AP-949,“Using Spin-Loops on Intel Pentium 4 Processor <strong>and</strong> Intel Xeon Processor.” Seehttp://www3.intel.com/cd/ids/developer/asmo-na/eng/dc/threading/knowledgebase/19083.htm.Inserting the PAUSE instruction has the added benefit of significantly reducing thepower consumed during the spin-wait because fewer system resources are used.8.4.3 <strong>Optimization</strong> with Spin-LocksSpin-locks are typically used when several threads needs to modify a synchronizationvariable <strong>and</strong> the synchronization variable must be protected by a lock to prevent unintentionaloverwrites. When the lock is released, however, several threads maycompete to acquire it at once. Such thread contention significantly reduces performancescaling with respect to frequency, number of discrete processors, <strong>and</strong> HTTechnology.To reduce the performance penalty, one approach is to reduce the likelihood of manythreads competing to acquire the same lock. Apply a software pipelining technique toh<strong>and</strong>le data that must be shared between multiple threads.Instead of allowing multiple threads to compete for a given lock, no more than twothreads should have write access to a given lock. If an application must use spinlocks,include the PAUSE instruction in the wait loop. Example 8-4(c) shows anexample of the “test, test-<strong>and</strong>-set” technique for determining the availability of thelock in a spin-wait loop.User/Source Coding Rule 21. (M impact, L generality) Replace a spin lock thatmay be acquired by multiple threads with pipelined locks such that no more thantwo threads have write accesses to one lock. If only one thread needs to write to avariable shared by two threads, there is no need to use a lock.8.4.4 Synchronization for Longer PeriodsWhen using a spin-wait loop not expected to be released quickly, an applicationshould follow these guidelines:• Keep the duration of the spin-wait loop to a minimum number of repetitions.• Applications should use an OS service to block the waiting thread; this canrelease the processor so that other runnable threads can make use of theprocessor or available execution resources.8-18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!