13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

SUMMARY OF RULES AND SUGGESTIONSavoid memory accesses that are offset by multiples of <strong>64</strong> KByte or 1 MByte, whentargeting Intel processors supporting HT Technology. ............................. 7-<strong>32</strong>User/Source Coding Rule 36. (M impact, L generality) Avoid excessive loopunrolling to ensure the Trace cache is operating efficiently. ...................... 7-34User/Source Coding Rule 37. (L impact, L generality) Optimize code size toimprove locality of Trace cache <strong>and</strong> increase delivered trace length .......... 7-34User/Source Coding Rule 38. (M impact, L generality) Consider using threadaffinity to optimize sharing resources cooperatively in the same core <strong>and</strong>subscribing dedicated resource in separate processor cores. .................... 7-37User/Source Coding Rule 39. (M impact, L generality) If a single threadconsumes half of the peak b<strong>and</strong>width of a specific execution unit (e.g. fdiv),consider adding a thread that seldom or rarely relies on that execution unit, whentuning for HT Technology .................................................................... 7-43E.3 TUNING SUGGESTIONSTuning Suggestion 1. In rare cases, a performance problem may be caused byexecuting data on a code page as instructions. This is very likely to happen whenexecution is following an indirect branch that is not resident in the trace cache. Ifthis is clearly causing a performance problem, try moving the data elsewhere, orinserting an illegal opcode or a pause instruction immediately after the indirectbranch. Note that the latter two alternatives may degrade performance in somecircumstances. .................................................................................. 3-63Tuning Suggestion 2. ...........If a load is found to miss frequently, either insert aprefetch before it or (if issue b<strong>and</strong>width is a concern) move the load up to executeearlier. ............................................................................................. 3-70Tuning Suggestion 3. Optimize single threaded code to maximize executionthroughput first. ................................................................................ 7-41Tuning Suggestion 4. Optimize multithreaded applications to achieve optimalprocessor scaling with respect to the number of physical processors or processorcores. ............................................................................................... 7-41Tuning Suggestion 5. Schedule threads that compete for the same executionresource to separate processor cores. ................................................... 7-41Tuning Suggestion 6. Use on-chip execution resources cooperatively if two logicalprocessors are sharing the execution resources in the same processorcore. ................................................................................................ 7-42Tuning Suggestion 7.E-11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!