13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

MULTICORE AND HYPER-THREADING TECHNOLOGYWith a multitasking workload, however, bus activities <strong>and</strong> cache access patterns arelikely to affect the scaling of the throughput. Running two copies of the same applicationor same suite of applications in a lock-step can expose an artifact in performancemeasuring methodology. This is because an access pattern to the first leveldata cache can lead to excessive cache misses <strong>and</strong> produce skewed performanceresults. Fix this problem by:• Including a per-instance offset at the start-up of an application• Introducing heterogeneity in the workload by using different datasets with eachinstance of the application• R<strong>and</strong>omizing the sequence of start-up of applications when running multiplecopies of the same suiteWhen two applications are employed as part of a multitasking workload, there is littlesynchronization overhead between these two processes. It is also important toensure each application has minimal synchronization overhead within itself.An application that uses lengthy spin loops for intra-process synchronization is lesslikely to benefit from HT Technology in a multitasking workload. This is because criticalresources will be consumed by the long spin loops.8.2 PROGRAMMING MODELS AND MULTITHREADINGParallelism is the most important concept in designing a multithreaded application<strong>and</strong> realizing optimal performance scaling with multiple processors. An optimizedmultithreaded application is characterized by large degrees of parallelism or minimaldependencies in the following areas:• Workload• Thread interaction• Hardware utilizationThe key to maximizing workload parallelism is to identify multiple tasks that haveminimal inter-dependencies within an application <strong>and</strong> to create separate threads forparallel execution of those tasks.Concurrent execution of independent threads is the essence of deploying a multithreadedapplication on a multiprocessing system. Managing the interaction betweenthreads to minimize the cost of thread synchronization is also critical to achievingoptimal performance scaling with multiple processors.Efficient use of hardware resources between concurrent threads requires optimizationtechniques in specific areas to prevent contentions of hardware resources.Coding techniques for optimizing thread synchronization <strong>and</strong> managing other hardwareresources are discussed in subsequent sections.Parallel programming models are discussed next.8-4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!