15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

to predictor size, and so are xlisp and perl to some extent. The hybrid predictor is the most sensitive to<br />

size, because it must allocate the available hardware budget across four tables: the selector’s PHT, the<br />

global-history component’s PHT, and the local-history component’s BHT and PHT. Each of these tables<br />

is therefore substantially smaller than in a single two-level predictor and therefore suffers more destructive<br />

interference. This especially affects the programs with large static branch footprints, like go and gcc. Yet<br />

a hybrid predictor also has an important advantage: in order to better control destructive conflicts, it<br />

can dynamically shift which component it uses to make a prediction for each branch.<br />

Note that these results do not include the effects of predication, context switching, operating system<br />

behavior, or any profile-guided feedback. All of these effects might change the results.<br />

Summary<br />

Branch prediction is important because otherwise every branch stalls the fetch engine. Some alternatives<br />

exist, like delay slots and predication, but delay slots are not compatible with modern, wide-issue superscalar<br />

processors, and predication cannot remove all branches. Static prediction techniques that require<br />

no hardware support are also possible, but they are either very simple, or in the case of compiler directives,<br />

require instruction-set support. Static techniques also have the drawback that they cannot adapt to changing<br />

run-time conditions.<br />

Dynamic branch-prediction techniques have evolved from the simple bimodal predictor to more<br />

sophisticated two-level and hybrid predictors that exploit patterns in branch behavior and correlation<br />

among branches. Refinements to these techniques, as well as new fetch organizations that permit fetching<br />

past multiple branches, continue to be active areas of research.<br />

The massive effort to find better branch-handling techniques is motivated by the severe penalty<br />

imposed by mispredictions. Especially with the long and wide pipelines of modern processors, a very<br />

small misprediction rate can severely harm performance. Indeed, the fetch bottleneck remains one of<br />

the most severe limitations on faster processing, and Jouppi and Ranganathan [53] argue that it may<br />

become the most severe bottleneck in future processors, even more severe than memory latency or<br />

memory bandwidth.<br />

References<br />

1. Skadron, K., Characterizing and removing branch mispredictions, PhD thesis, Princeton University<br />

Department of Computer Science, Princeton, NJ, 1999.<br />

2. Calder, B. and Grunwald, D., Reducing indirect function call overhead in C++ programs, in Proc.<br />

21st ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pp. 397–408, Jan.<br />

1994.<br />

3. Chang, P.-Y., Hao, E., and Patt, Y. N., Target prediction for indirect jumps, in Proc. 24th Ann. Int.<br />

Symp. on Computer Architecture, pp. 274–283, June 1997.<br />

4. Driesen, K. and Hölzle, U., Accurate indirect branch prediction, in Proc. 25th Ann. Int. Symp. on<br />

Computer Architecture, pp. 167–178, July 1998.<br />

5. Kalamatianos, J. and Kaeli, D. R., Predicting indirect branches via data compression, in Proc. 31st<br />

Ann. ACM/IEEE Int. Symp. on Microarchitecture, pp. 272–281, Dec. 1998.<br />

6. Kaeli, D. R. and Emma, P. G., Branch history table prediction of moving target branches due to<br />

subroutine returns, in Proc. 18th Ann. Int. Symp. on Computer Architecture, pp. 34–41, May 1991.<br />

7. Webb, C. F., Subroutine call/return stack, IBM Technical Discl. Bull., 30(11), April 1988.<br />

8. Gwennap, L., Digital 21264 sets new standard, Microprocessor Report, pp. 11–16, Oct. 28, 1996.<br />

9. Gwennap, L., Intel’s P6 uses decoupled superscalar design, Microprocessor Report, pp. 9–15, Feb. 16,<br />

1995.<br />

10. Patterson, D. A. and Hennessy, J. L., Computer Architecture: A Quantitative Approach, 2nd ed., Morgan<br />

Kaufmann, San Francisco, 1996.<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!