13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MULTICORE AND HYPER-THREADING TECHNOLOGY• Adjust the private stack of each thread in an application so the spacing betweenthese stacks is not offset by multiples of <strong>64</strong> KBytes or 1 MByte (preventsunnecessary cache line evictions) when targeting processors supporting HTTechnology.• Add a per-instance stack offset when two instances of the same application areexecuting in lock steps to avoid memory accesses that are offset by multiples of<strong>64</strong> KByte or 1 MByte when targeting processors supporting HT Technology.See Section 8.6, “Memory <strong>Optimization</strong>,” for details.8.3.4 Key Practices of Front-end <strong>Optimization</strong>Key practices for front-end optimization on processors that support HT Technologyare:• Avoid Excessive Loop Unrolling to ensure the Trace Cache is operating efficiently.• Optimize code size to improve locality of Trace Cache <strong>and</strong> increase delivered tracelength.See Section 8.7, “Front-end <strong>Optimization</strong>,” for details.8.3.5 Key Practices of Execution Resource <strong>Optimization</strong>Each physical processor has dedicated execution resources. Logical processors inphysical processors supporting HT Technology share specific on-chip executionresources. Key practices for execution resource optimization include:• Optimize each thread to achieve optimal frequency scaling first.• Optimize multithreaded applications to achieve optimal scaling with respect tothe number of physical processors.• Use on-chip execution resources cooperatively if two threads are sharing theexecution resources in the same physical processor package.• For each processor supporting HT Technology, consider adding functionallyuncorrelated threads to increase the hardware resource utilization of eachphysical processor package.See Section 8.8, “Using Thread Affinities to Manage Shared Platform Resources,” fordetails.8.3.6 Generality <strong>and</strong> Performance ImpactThe next five sections cover the optimization techniques in detail. Recommendationsdiscussed in each section are ranked by importance in terms of estimated localimpact <strong>and</strong> generality.8-13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!