15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

is that mispredictions limit the processor’s ability to build up a large window of instructions over which<br />

to expose ILP.<br />

With the misprediction penalty so high in terms of wasted instruction-issue opportunities, not only<br />

is branch prediction necessary, but the highest possible prediction accuracy is necessary in order to<br />

minimize stall cycles and maximize the processor’s ability to exploit ILP.<br />

Software Techniques<br />

Branches can be predicted or otherwise managed by both software and hardware techniques. This section<br />

focuses on software techniques, and the section on “Hardware Techniques” focuses on hardware techniques.<br />

Branch Delay Slots<br />

One early software technique that was able to eliminate the need for prediction in early processors is the<br />

branch delay slot. Instead of predicting the branch’s outcome, the instruction-set architecture can be<br />

defined so that some number of instructions following a branch execute regardless of the branch’s<br />

outcome. These instruction positions are called delay slot(s) and must be filled with instructions that<br />

are safe to execute regardless of the outcome of the branch, or with nops (but nops do no useful work).<br />

Instructions to fill the delay slot might come from positions that preceded the branch in the original<br />

code schedule but can safely be reordered, for example. Consider the sequence of code:<br />

1. add r1, r2, r3<br />

2. add r4, r5, r6<br />

3. bnez r6<br />

4. (delay slot)<br />

Instruction 1 can safely be moved into the delay slot, because doing so violates no data dependencies.<br />

Instruction 2, of course, cannot be moved into the delay slot, because it computes the value of r6 that<br />

the branch then examines. More aggressive techniques can analyze instructions from after the branch,<br />

identify a safe instruction, and hoist it into the delay slot. A more thorough treatment of branch delay<br />

slots and associated techniques can be found in [10].<br />

Unfortunately, delay slots have drawbacks. Even the most aggressive techniques still leave some delay<br />

slots unfilled, wasting instruction-issue opportunities. Delay slots also have the problem that they expose<br />

processor implementation details that might change. Current instruction sets that use delay slots were<br />

defined when processors issued instructions in order, one at a time, and pipelines were short. The branch<br />

resolution delay was hence just one cycle and the corresponding penalty was only one instruction issue slot,<br />

so these instruction sets defined branches to have a single delay slot. Examples include the MIPS® 3 [11]<br />

and SPARC® 4 [12] instruction sets. Yet, later implementations made the pipeline longer and issued<br />

multiple instructions per cycle. This meant that the resolution delay corresponded to many issue slots,<br />

even though the number of delay slots was still fixed by the instruction set at one instruction. In addition,<br />

with multiple issue, a bundle of instructions being considered for issue in any particular cycle might<br />

consist of several instructions following a branch. Exactly one of these—the delay slot—must be issued<br />

unconditionally, while the others are control-dependent on the branch and their execution depends on<br />

the branch outcome. For these reasons, later instruction sets like Alpha AXP [13] do not include delay<br />

slots.<br />

Profiling and Compiler Annotation<br />

An alternative software technique is to profile the program’s behavior by gathering data about how<br />

individual branches behave. This involves gathering data while the program is running about its branches’<br />

behavior. This data can then be fed to a second compilation pass, which annotates the branches to indicate<br />

3 MIPS Technologies, Mountainview, California.<br />

4 SPARC International, Inc., Santa Clara, California.<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!