13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

MULTICORE AND HYPER-THREADING TECHNOLOGYRankings are subjective <strong>and</strong> approximate. They can vary depending on coding style,application <strong>and</strong> threading domain. The purpose of including high, medium <strong>and</strong> lowimpact ranking with each recommendation is to provide a relative indicator as to thedegree of performance gain that can be expected when a recommendation is implemented.It is not possible to predict the likelihood of a code instance across many applications,so an impact ranking cannot be directly correlated to application-level performancegain. The ranking on generality is also subjective <strong>and</strong> approximate.Coding recommendations that do not impact all three scaling factors are typicallycategorized as medium or lower.8.4 THREAD SYNCHRONIZATIONApplications with multiple threads use synchronization techniques in order to ensurecorrect operation. However, thread synchronization that are improperly implementedcan significantly reduce performance.The best practice to reduce the overhead of thread synchronization is to start byreducing the application’s requirements for synchronization. Intel Thread Profiler canbe used to profile the execution timeline of each thread <strong>and</strong> detect situations whereperformance is impacted by frequent occurrences of synchronization overhead.Several coding techniques <strong>and</strong> operating system (OS) calls are frequently used forthread synchronization. These include spin-wait loops, spin-locks, critical sections, toname a few. Choosing the optimal OS call for the circumstance <strong>and</strong> implementingsynchronization code with parallelism in mind are critical in minimizing the cost ofh<strong>and</strong>ling thread synchronization.SSE3 provides two instructions (MONITOR/MWAIT) to help multithreaded softwareimprove synchronization between multiple agents. In the first implementation ofMONITOR <strong>and</strong> MWAIT, these instructions are available to operating system so thatoperating system can optimize thread synchronization in different areas. Forexample, an operating system can use MONITOR <strong>and</strong> MWAIT in its system idle loop(known as C0 loop) to reduce power consumption. An operating system can also useMONITOR <strong>and</strong> MWAIT to implement its C1 loop to improve the responsiveness of theC1 loop. See Chapter 7 in the Intel® <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> <strong>Architectures</strong> Software Developer’s<strong>Manual</strong>, Volume 3A.8.4.1 Choice of Synchronization PrimitivesThread synchronization often involves modifying some shared data while protectingthe operation using synchronization primitives. There are many primitives to choosefrom. Guidelines that are useful when selecting synchronization primitives are:• Favor compiler intrinsics or an OS provided interlocked API for atomic updates ofsimple data operation, such as increment <strong>and</strong> compare/exchange. This will be8-14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!