13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESthrough; therefore, the BTB does not issue a prediction. The static predictor,however, will predict the branch to be taken, so a misprediction will not occur.Example 3-6. Static Taken PredictionBegin: mov eax, mem<strong>32</strong><strong>and</strong> eax, ebximul eax, edxshld eax, 7jc BeginThe first branch instruction (JC BEGIN) in Example 3-7 is a conditional forwardbranch. It is not in the BTB the first time through, but the static predictor will predictthe branch to fall through. The static prediction algorithm correctly predicts that theCALL CONVERT instruction will be taken, even before the branch has any branchhistory in the BTB.Example 3-7. Static Not-Taken Predictionmov eax, mem<strong>32</strong><strong>and</strong> eax, ebximul eax, edxshld eax, 7jc Beginmov eax, 0Begin: call ConvertThe Intel Core microarchitecture does not use the static prediction heuristic.However, to maintain consistency across Intel <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> processors, softwareshould maintain the static prediction heuristic as the default.3.4.1.4 Inlining, Calls <strong>and</strong> ReturnsThe return address stack mechanism augments the static <strong>and</strong> dynamic predictors tooptimize specifically for calls <strong>and</strong> returns. It holds 16 entries, which is large enoughto cover the call depth of most programs. If there is a chain of more than 16 nestedcalls <strong>and</strong> more than 16 returns in rapid succession, performance may degrade.The trace cache in Intel NetBurst microarchitecture maintains branch predictioninformation for calls <strong>and</strong> returns. As long as the trace with the call or return remainsin the trace cache <strong>and</strong> the call <strong>and</strong> return targets remain unchanged, the depth limitof the return address stack described above will not impede performance.To enable the use of the return stack mechanism, calls <strong>and</strong> returns must be matchedin pairs. If this is done, the likelihood of exceeding the stack depth in a manner thatwill impact performance is very low.3-11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!