13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINESThere are several cases in which data is passed through memory, <strong>and</strong> the store mayneed to be separated from the load:• Spills, save <strong>and</strong> restore registers in a stack frame• Parameter passing• Global <strong>and</strong> volatile variables• Type conversion between integer <strong>and</strong> floating point• When compilers do not analyze code that is inlined, forcing variables that areinvolved in the interface with inlined code to be in memory, creating morememory variables <strong>and</strong> preventing the elimination of redundant loadsAssembly/Compiler Coding Rule 51. (H impact, MH generality) Where it ispossible to do so without incurring other penalties, prioritize the allocation ofvariables to registers, as in register allocation <strong>and</strong> for parameter passing, tominimize the likelihood <strong>and</strong> impact of store-forwarding problems. Try not to storeforwarddata generated from a long latency instruction - for example, MUL or DIV.Avoid store-forwarding data for variables with the shortest store-load distance.Avoid store-forwarding data for variables with many <strong>and</strong>/or long dependencechains, <strong>and</strong> especially avoid including a store forward on a loop-carried dependencechain.Example 3-34 shows an example of a loop-carried dependence chain.Example 3-34. Loop-carried Dependence Chainfor ( i = 0; i < MAX; i++ ) {a[i] = b[i] * foo;foo = a[i] / 3;} // foo is a loop-carried dependence.Assembly/Compiler Coding Rule 52. (M impact, MH generality) Calculatestore addresses as early as possible to avoid having stores block loads.3.6.5 Data Layout <strong>Optimization</strong>sUser/Source Coding Rule 6. (H impact, M generality) Pad data structuresdefined in the source code so that every data element is aligned to a naturaloper<strong>and</strong> size address boundary.If the oper<strong>and</strong>s are packed in a SIMD instruction, align to the packed element size(<strong>64</strong>-bit or 128-bit).Align data by providing padding inside structures <strong>and</strong> arrays. Programmers can reorganizestructures <strong>and</strong> arrays to minimize the amount of memory wasted by padding.However, compilers might not have this freedom. The C programming language, forexample, specifies the order in which structure elements are allocated in memory. Formore information, see Section 4.4, “Stack <strong>and</strong> Data Alignment,” <strong>and</strong> Appendix D,“Stack Alignment.”3-56

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!