13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

SUMMARY OF RULES AND SUGGESTIONSCore Duo processors), <strong>and</strong> within a sector (128 bytes on Pentium 4 <strong>and</strong> Intel Xeonprocessors) .......................................................................................7-21User/Source Coding Rule 24. (M impact, ML generality) Place eachsynchronization variable alone, separated by 128 bytes or in a separate cacheline. .................................................................................................7-22User/Source Coding Rule 25. (H impact, L generality) Do not place any spinlock variable to span a cache line boundary ...........................................7-22User/Source Coding Rule 26. (M impact, H generality) Improve data <strong>and</strong> codelocality to conserve bus comm<strong>and</strong> b<strong>and</strong>width. ........................................7-24User/Source Coding Rule 27. (M impact, L generality) Avoid excessive use ofsoftware prefetch instructions <strong>and</strong> allow automatic hardware prefetcher to work.Excessive use of software prefetches can significantly <strong>and</strong> unnecessarily increasebus utilization if used inappropriately. ...................................................7-25User/Source Coding Rule 28. (M impact, M generality) Consider usingoverlapping multiple back-to-back memory reads to improve effective cache misslatencies. ..........................................................................................7-26User/Source Coding Rule 29. (M impact, M generality) Consider adjusting thesequencing of memory references such that the distribution of distances ofsuccessive cache misses of the last level cache peaks towards <strong>64</strong> bytes. ....7-26User/Source Coding Rule 30. (M impact, M generality) Use full writetransactions to achieve higher data throughput. .....................................7-26User/Source Coding Rule 31. (H impact, H generality) Use cache blocking toimprove locality of data access. Target one quarter to one half of the cache sizewhen targeting Intel processors supporting HT Technology or target a block sizethat allow all the logical processors serviced by a cache to share that cachesimultaneously. ..................................................................................7-27User/Source Coding Rule <strong>32</strong>. (H impact, M generality) Minimize the sharing ofdata between threads that execute on different bus agents sharing a commonbus. The situation of a platform consisting of multiple bus domains should alsominimize data sharing across bus domains ............................................7-28User/Source Coding Rule 33. (H impact, H generality) Minimize data accesspatterns that are offset by multiples of <strong>64</strong> KBytes in each thread. .............7-30User/Source Coding Rule 34. (H impact, M generality) Adjust the private stackof each thread in an application so that the spacing between these stacks is notoffset by multiples of <strong>64</strong> KBytes or 1 MByte to prevent unnecessary cache lineevictions (when using Intel processors supporting HT Technology). ...........7-31User/Source Coding Rule 35. (M impact, L generality) Add per-instance stackoffset when two instances of the same application are executing in lock steps toE-10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!