13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESAssembly/Compiler Coding Rule 12. (M impact, H generality) All branchtargets should be 16-byte aligned.Assembly/Compiler Coding Rule 13. (M impact, H generality) If the body of aconditional is not likely to be executed, it should be placed in another part of theprogram. If it is highly unlikely to be executed <strong>and</strong> code locality is an issue, itshould be placed on a different code page.3.4.1.6 Branch Type SelectionThe default predicted target for indirect branches <strong>and</strong> calls is the fall-through path.Fall-through prediction is overridden if <strong>and</strong> when a hardware prediction is availablefor that branch. The predicted branch target from branch prediction hardware for anindirect branch is the previously executed branch target.The default prediction to the fall-through path is only a significant issue if no branchprediction is available, due to poor code locality or pathological branch conflict problems.For indirect calls, predicting the fall-through path is usually not an issue, sinceexecution will likely return to the instruction after the associated return.Placing data immediately following an indirect branch can cause a performanceproblem. If the data consists of all zeros, it looks like a long stream of ADDs tomemory destinations <strong>and</strong> this can cause resource conflicts <strong>and</strong> slow down branchrecovery. Also, data immediately following indirect branches may appear as branchesto the branch predication hardware, which can branch off to execute other datapages. This can lead to subsequent self-modifying code problems.Assembly/Compiler Coding Rule 14. (M impact, L generality) When indirectbranches are present, try to put the most likely target of an indirect branchimmediately following the indirect branch. Alternatively, if indirect branches arecommon but they cannot be predicted by branch prediction hardware, then followthe indirect branch with a UD2 instruction, which will stop the processor fromdecoding down the fall-through path.Indirect branches resulting from code constructs (such as switch statements,computed GOTOs or calls through pointers) can jump to an arbitrary number of locations.If the code sequence is such that the target destination of a branch goes to thesame address most of the time, then the BTB will predict accurately most of the time.Since only one taken (non-fall-through) target can be stored in the BTB, indirectbranches with multiple taken targets may have lower prediction rates.The effective number of targets stored may be increased by introducing additionalconditional branches. Adding a conditional branch to a target is fruitful if:• The branch direction is correlated with the branch history leading up to thatbranch; that is, not just the last target, but how it got to this branch.• The source/target pair is common enough to warrant using the extra branchprediction capacity. This may increase the number of overall branch mispredictions,while improving the misprediction of indirect branches. The profitability islower if the number of mispredicting branches is very large.3-13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!