13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

SUMMARY OF RULES AND SUGGESTIONSreturns. Pushing the return address on the stack <strong>and</strong> jumping to the routine to becalled is not recommended since it creates a mismatch in calls <strong>and</strong> returns. 3-12Assembler/Compiler Coding Rule 5. (MH impact, MH generality) Selectivelyinline a function if doing so decreases code size or if the function is small <strong>and</strong> thecall site is frequently executed. .............................................................3-12Assembler/Compiler Coding Rule 6. (H impact, H generality) Do not inline afunction if doing so increases the working set size beyond what will fit in the tracecache. ...............................................................................................3-12Assembler/Compiler Coding Rule 7. (ML impact, ML generality) If there aremore than 16 nested calls <strong>and</strong> returns in rapid succession; consider transformingthe program with inline to reduce the call depth. .....................................3-12Assembler/Compiler Coding Rule 8. (ML impact, ML generality) Favor inliningsmall functions that contain branches with poor prediction rates. If a branchmisprediction results in a RETURN being prematurely predicted as taken, aperformance penalty may be incurred.)..................................................3-12Assembler/Compiler Coding Rule 9. (L impact, L generality) If the laststatement in a function is a call to another function, consider converting the callto a jump. This will save the call/return overhead as well as an entry in the returnstack buffer........................................................................................3-12Assembler/Compiler Coding Rule 10. (M impact, L generality) Do not putmore than four branches in a 16-byte chunk...........................................3-12Assembler/Compiler Coding Rule 11. (M impact, L generality) Do not putmore than two end loop branches in a 16-byte chunk...............................3-12Assembler/Compiler Coding Rule 12. (M impact, H generality) All branchtargets should be 16-byte aligned. ........................................................3-13Assembler/Compiler Coding Rule 13. (M impact, H generality) If the body ofa conditional is not likely to be executed, it should be placed in another part ofthe program. If it is highly unlikely to be executed <strong>and</strong> code locality is an issue,it should be placed on a different code page. ..........................................3-13Assembler/Compiler Coding Rule 14. (M impact, L generality) When indirectbranches are present, try to put the most likely target of an indirect branchimmediately following the indirect branch. Alternatively, if indirect branches arecommon but they cannot be predicted by branch prediction hardware, then followthe indirect branch with a UD2 instruction, which will stop the processor fromdecoding down the fall-through path......................................................3-13Assembler/Compiler Coding Rule 15. (H impact, M generality) Unroll smallloops until the overhead of the branch <strong>and</strong> induction variable accounts (generally)for less than 10% of the execution time of the loop. ................................3-16Assembler/Compiler Coding Rule 16. (H impact, M generality) Avoid unrollingloops excessively; this may thrash the trace cache or instruction cache. .....3-16Assembler/Compiler Coding Rule 17. (M impact, M generality) Unroll loopsthat are frequently executed <strong>and</strong> have a predictable number of iterations toreduce the number of iterations to 16 or fewer. Do this unless it increases codesize so that the working set no longer fits in the trace or instruction cache. If theE-2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!