13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

MULTICORE AND HYPER-THREADING TECHNOLOGY8.9 OPTIMIZATION OF OTHER SHARED RESOURCESResource optimization in multi-threaded application depends on the cache topology<strong>and</strong> execution resources associated within the hierarchy of processor topology.Processor topology <strong>and</strong> an algorithm for software to identify the processor topologyare discussed in Chapter 7 of the Intel® <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> <strong>Architectures</strong> Software Developer’s<strong>Manual</strong>, Volume 3A.Typically the bus system is shared by multiple agents at the SMT level <strong>and</strong> at theprocessor core level of the processor topology. Thus multi-threaded applicationdesign should start with an approach to manage the bus b<strong>and</strong>width available tomultiple processor agents sharing the same bus link in an equitable manner. This canbe done by improving the data locality of an individual application thread or allowingtwo threads to take advantage of a shared second-level cache (where such sharedcache topology is available).In general, optimizing the building blocks of a multi-threaded application can startfrom an individual thread. The guidelines discussed in Chapter 3 through Chapter 9largely apply to multi-threaded optimization.Tuning Suggestion 3. Optimize single threaded code to maximize executionthroughput first.At the SMT level, HT Technology typically can provide two logical processors sharingexecution resources within a processor core. To help multithreaded applicationsutilize shared execution resources effectively, the rest of this section describes guidelinesto deal with common situations as well as those limited situations where executionresource utilization between threads may impact overall performance.Most applications only use about 20-30% of peak execution resources when runningin a single-threaded environment. A useful indicator that relates to this is bymeasuring the execution throughput at the retirement stage (See Appendix A.2.1.3,“Workload Characterization”). In a processor that supports HT Technology, executionthroughput seldom reaches 50% of peak retirement b<strong>and</strong>width. Thus, improvingsingle-thread execution throughput should also benefit multithreading performance.Tuning Suggestion 4. Optimize multithreaded applications to achieve optimalprocessor scaling with respect to the number of physical processors or processorcores.Following the guidelines, such as reduce thread synchronization costs, localityenhancements, <strong>and</strong> conserving bus b<strong>and</strong>width, will allow multithreading hardware toexploit task-level parallelism in the workload <strong>and</strong> improve MP scaling. In general,reducing the dependence of resources shared between physical packages will benefitprocessor scaling with respect to the number of physical processors. Similarly, heavyreliance on resources shared with different cores is likely to reduce processor scalingperformance. On the other h<strong>and</strong>, using shared resource effectively can deliver positivebenefit in processor scaling, if the workload does saturate the critical resource incontention.Tuning Suggestion 5. Schedule threads that compete for the same executionresource to separate processor cores.8-41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!