13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESThe following rules apply to inlining, calls, <strong>and</strong> returns.Assembly/Compiler Coding Rule 4. (MH impact, MH generality) Near callsmust be matched with near returns, <strong>and</strong> far calls must be matched with far returns.Pushing the return address on the stack <strong>and</strong> jumping to the routine to be called isnot recommended since it creates a mismatch in calls <strong>and</strong> returns.Calls <strong>and</strong> returns are expensive; use inlining for the following reasons:• Parameter passing overhead can be eliminated.• In a compiler, inlining a function exposes more opportunity for optimization.• If the inlined routine contains branches, the additional context of the caller mayimprove branch prediction within the routine.• A mispredicted branch can lead to performance penalties inside a small functionthat are larger than those that would occur if that function is inlined.Assembly/Compiler Coding Rule 5. (MH impact, MH generality) Selectivelyinline a function if doing so decreases code size or if the function is small <strong>and</strong> thecall site is frequently executed.Assembly/Compiler Coding Rule 6. (H impact, H generality) Do not inline afunction if doing so increases the working set size beyond what will fit in the tracecache.Assembly/Compiler Coding Rule 7. (ML impact, ML generality) If there aremore than 16 nested calls <strong>and</strong> returns in rapid succession; consider transformingthe program with inline to reduce the call depth.Assembly/Compiler Coding Rule 8. (ML impact, ML generality) Favor inliningsmall functions that contain branches with poor prediction rates. If a branchmisprediction results in a RETURN being prematurely predicted as taken, aperformance penalty may be incurred.)Assembly/Compiler Coding Rule 9. (L impact, L generality) If the laststatement in a function is a call to another function, consider converting the call toa jump. This will save the call/return overhead as well as an entry in the returnstack buffer.Assembly/Compiler Coding Rule 10. (M impact, L generality) Do not putmore than four branches in a 16-byte chunk.Assembly/Compiler Coding Rule 11. (M impact, L generality) Do not putmore than two end loop branches in a 16-byte chunk.3.4.1.5 Code AlignmentCareful arrangement of code can enhance cache <strong>and</strong> memory locality. Likelysequences of basic blocks should be laid out contiguously in memory. This mayinvolve removing unlikely code, such as code to h<strong>and</strong>le error conditions, from thesequence. See Section 3.7, “Prefetching,” on optimizing the instruction prefetcher.3-12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!