13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MULTICORE AND HYPER-THREADING TECHNOLOGYOther techniques that prevent false-sharing include:• Organize variables of different types in data structures (because the layout thatcompilers give to data variables might be different than their placement in thesource code).• When each thread needs to use its own copy of a set of variables, declare thevariables with:— Directive threadprivate, when using OpenMP— Modifier __declspec (thread), when using Microsoft compiler• In managed environments that provide automatic object allocation, the objectallocators <strong>and</strong> garbage collectors are responsible for layout of the objects inmemory so that false sharing through two objects does not happen.• Provide classes such that only one thread writes to each object field <strong>and</strong> closeobject fields, in order to avoid false sharing.One should not equate the recommendations discussed in this section as favoring asparsely populated data layout. The data-layout recommendations should beadopted when necessary <strong>and</strong> avoid unnecessary bloat in the size of the work set.8.5 SYSTEM BUS OPTIMIZATIONThe system bus services requests from bus agents (e.g. logical processors) to fetchdata or code from the memory sub-system. The performance impact due data trafficfetched from memory depends on the characteristics of the workload, <strong>and</strong> the degreeof software optimization on memory access, locality enhancements implemented inthe software code. A number of techniques to characterize memory traffic of a workloadis discussed in Appendix A. <strong>Optimization</strong> guidelines on locality enhancement isalso discussed in Section 3.6.10, “Locality Enhancement,” <strong>and</strong> Section 9.6.11, “HardwarePrefetching <strong>and</strong> Cache Blocking Techniques.”The techniques described in Chapter 3 <strong>and</strong> Chapter 9 benefit application performancein a platform where the bus system is servicing a single-threaded environment.In a multi-threaded environment, the bus system typically services manymore logical processors, each of which can issue bus requests independently. Thus,techniques on locality enhancements, conserving bus b<strong>and</strong>width, reducing largestride-cache-miss-delaycan have strong impact on processor scaling performance.8.5.1 Conserve Bus B<strong>and</strong>widthIn a multithreading environment, bus b<strong>and</strong>width may be shared by memory trafficoriginated from multiple bus agents (These agents can be several logical processors<strong>and</strong>/or several processor cores). Preserving the bus b<strong>and</strong>width can improveprocessor scaling performance. Also, effective bus b<strong>and</strong>width typically will decreaseif there are significant large-stride cache-misses. Reducing the amount of large-8-23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!