13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MULTICORE AND HYPER-THREADING TECHNOLOGYIf the control flow of a multi-threaded application contains a workload in which only50% can be executed in parallel, the maximum performance gain using two physicalprocessors is only 33%, compared to using a single processor. Using four processorscan deliver no more than a 60% speed-up over a single processor. Thus, it is critical tomaximize the portion of control flow that can take advantage of parallelism. Improperimplementation of thread synchronization can significantly increase the proportion ofserial control flow <strong>and</strong> further reduce the application’s performance scaling.In addition to maximizing the parallelism of control flows, interaction betweenthreads in the form of thread synchronization <strong>and</strong> imbalance of task scheduling canalso impact overall processor scaling significantly.Excessive cache misses are one cause of poor performance scaling. In a multithreadedexecution environment, they can occur from:• Aliased stack accesses by different threads in the same process• Thread contentions resulting in cache line evictions• False-sharing of cache lines between different processorsTechniques that address each of these situations (<strong>and</strong> many other areas) aredescribed in sections in this chapter.8.1.2 Multitasking EnvironmentHardware multithreading capabilities in Intel <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> processors can exploittask-level parallelism when a workload consists of several single-threaded applications<strong>and</strong> these applications are scheduled to run concurrently under an MP-awareoperating system. In this environment, hardware multithreading capabilities c<strong>and</strong>eliver higher throughput for the workload, although the relative performance of asingle task (in terms of time of completion relative to the same task when in a singlethreadedenvironment) will vary, depending on how much shared executionresources <strong>and</strong> memory are utilized.For development purposes, several popular operating systems (for exampleMicrosoft Windows* XP Professional <strong>and</strong> Home, Linux* distributions using kernel2.4.19 or later 2 ) include OS kernel code that can manage the task scheduling <strong>and</strong> thebalancing of shared execution resources within each physical processor to maximizethe throughput.Because applications run independently under a multitasking environment, threadsynchronization issues are less likely to limit the scaling of throughput. This isbecause the control flow of the workload is likely to be 100% parallel 3 (if no interprocessorcommunication is taking place <strong>and</strong> if there are no system bus constraints).2. This code is included in Red Hat* Linux Enterprise AS 2.1.3. A software tool that attempts to measure the throughput of a multitasking workload is likely tointroduce control flows that are not parallel. Thread synchronization issues must be consideredas an integral part of its performance measuring methodology.8-3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!