13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MULTICORE AND HYPER-THREADING TECHNOLOGYOn processors supporting HT Technology, operating systems should use the HLTinstruction if one logical processor is active <strong>and</strong> the other is not. HLT will allow an idlelogical processor to transition to a halted state; this allows the active logicalprocessor to use all the hardware resources in the physical package. An operatingsystem that does not use this technique must still execute instructions on the idlelogical processor that repeatedly check for work. This “idle loop” consumes executionresources that could otherwise be used to make progress on the other active logicalprocessor.If an application thread must remain idle for a long time, the application should usea thread blocking API or other method to release the idle processor. The techniquesdiscussed here apply to traditional MP system, but they have an even higher impacton processors that support HT Technology.Typically, an operating system provides timing services, for example Sleep (dwMilliseconds)6 ; such variables can be used to prevent frequent checking of a synchronizationvariable.Another technique to synchronize between worker threads <strong>and</strong> a control loop is touse a thread-blocking API provided by the OS. Using a thread-blocking API allows thecontrol thread to use less processor cycles for spinning <strong>and</strong> waiting. This gives the OSmore time quanta to schedule the worker threads on available processors. Furthermore,using a thread-blocking API also benefits from the system idle loop optimizationthat OS implements using the HLT instruction.User/Source Coding Rule 22. (H impact, M generality) Use a thread-blockingAPI in a long idle loop to free up the processor.Using a spin-wait loop in a traditional MP system may be less of an issue when thenumber of runnable threads is less than the number of processors in the system. Ifthe number of threads in an application is expected to be greater than the number ofprocessors (either one processor or multiple processors), use a thread-blocking APIto free up processor resources. A multithreaded application adopting one controlthread to synchronize multiple worker threads may consider limiting worker threadsto the number of processors in a system <strong>and</strong> use thread-blocking APIs in the controlthread.8.4.4.1 Avoid Coding Pitfalls in Thread SynchronizationSynchronization between multiple threads must be designed <strong>and</strong> implemented withcare to achieve good performance scaling with respect to the number of discreteprocessors <strong>and</strong> the number of logical processor per physical processor. No singletechnique is a universal solution for every synchronization situation.The pseudo-code example in Example 8-5(a) illustrates a polling loop implementationof a control thread. If there is only one runnable worker thread, an attempt to6. The Sleep() API is not thread-blocking, because it does not guarantee the processor will bereleased. Example 8-5(a) shows an example of using Sleep(0), which does not always realize theprocessor to another thread.8-19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!