14.01.2015 Views

A Study of Value-Based Branch Prediction Techniques

A Study of Value-Based Branch Prediction Techniques

A Study of Value-Based Branch Prediction Techniques

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A <strong>Study</strong> <strong>of</strong> <strong>Value</strong>-<strong>Based</strong> <strong>Branch</strong><br />

<strong>Prediction</strong> <strong>Techniques</strong><br />

Krishnan Sundaresan, Srivathsan Krishnamohan<br />

{sundare2, krishn37}@msu.edu<br />

Abstract—<br />

In this paper, we implement a value-based branch prediction<br />

scheme called BPVP – branch prediction based on<br />

value prediction that was proposed by Gonzalez et al. [1].<br />

<strong>Value</strong>-based branch prediction schemes are those that speculatively<br />

compute the direction and target <strong>of</strong> branch instructions<br />

by predicting the register values on which the branch<br />

condition is evaluated. We evaluate the BPVP branch<br />

predictor using different value prediction mechanisms (last<br />

value, stride, and two-level) and compare its accuracy with<br />

existing (bimodal and gshare) branch prediction schemes.<br />

We also present an up-to-date survey on this relatively new<br />

area <strong>of</strong> research in branch prediction.<br />

Keywords— <strong>Branch</strong> prediction, control dependence, speculative<br />

execution, value prediction.<br />

I. Introduction<br />

IMPROVING the performance <strong>of</strong> modern pipelined processors<br />

has two major impediments, namely, control dependences<br />

and (2) data dependences between instructions.<br />

These: (1) necessitate stall cycles during which the pipeline<br />

units are idle and (2) limit the throughput <strong>of</strong> the pipeline.<br />

Both these effects reduce the speedup that is theoretically<br />

achievable by pipelining. In modern processors dynamic<br />

branch prediction is employed to reduce the number<br />

<strong>of</strong> stalls due to control dependences. Another technique,<br />

called dynamic scheduling, helps to reduce stalls due to<br />

data dependences by spacing dependent instructions apart<br />

and issuing them out-<strong>of</strong>-order. However, reducing stalls<br />

alone will not keep the pipeline busy and improve performance.<br />

Many processors also adopt a technique called<br />

speculation to realize higher performance by increasing the<br />

amount <strong>of</strong> instruction-level parallelism (ILP) that they can<br />

exploit. In speculative execution, the outcome <strong>of</strong> dependent<br />

instructions are guessed beforehand and the instruction<br />

stream is executed as if the guesses were correct. These<br />

guesses are dynamically obtained using dedicated branch<br />

prediction hardware for branch instructions or value prediction<br />

hardware for other (for example, load) instructions. A<br />

speculative processor also includes hardware that can undo<br />

the effects <strong>of</strong> an incorrect execution by forcing the instruction<br />

to commit i.e., write its results to the register file, only<br />

after the instruction is no longer speculative [2].<br />

K. Sundaresan and S. Krishnamohan are graduate students in the<br />

Department <strong>of</strong> Electrical and Computer Engineering, Michigan State<br />

University, East Lansing, MI. The work described in this paper was<br />

done as part <strong>of</strong> the final project for the course ECE/CSE 820: Advanced<br />

Computer Architecture, Fall 2003 taught by Dr. Anthony<br />

Wojcik at Michigan State University.<br />

A. <strong>Branch</strong> <strong>Prediction</strong><br />

<strong>Branch</strong> prediction involves predicting the direction <strong>of</strong> the<br />

branch (taken or not taken) as well as predicting the target<br />

address <strong>of</strong> the branch before the end <strong>of</strong> the instruction<br />

decode (ID) stage. This enables the instruction fetch (IF)<br />

stage to speculatively fetch instructions based on the predicted<br />

branch direction and target address, thus supplying<br />

the pipeline continuously with instructions and increasing<br />

the instruction-level parallelism (ILP). When speculative<br />

execution is used with branch prediction, the actual direction<br />

the branch will take is known only later in the execution<br />

(EX) stage. If this result differs from the prediction<br />

made earlier, it is necessary to re-fetch the instruction<br />

stream, starting from the correct branch target. The cost<br />

<strong>of</strong> doing this is called the misprediction penalty. Misprediction<br />

penalties are obviously higher for deep pipelines.<br />

Thus, for a branch prediction scheme to be effective the<br />

product <strong>of</strong> the two terms: (1) number <strong>of</strong> mispredicted<br />

branches and (2) the penalty for each mispredicted branch,<br />

should be small. The effectiveness <strong>of</strong> branch prediction<br />

is <strong>of</strong>ten measured in terms <strong>of</strong> the branch prediction accuracy<br />

which is defined as the number <strong>of</strong> successful branch<br />

predictions performed by the branch predictor out <strong>of</strong> the<br />

overall number <strong>of</strong> prediction attempts [3]. Thus, research<br />

in branch prediction has focused on designing bigger, better,<br />

and/or more complex predictors to get higher branch<br />

prediction accuracies.<br />

B. <strong>Value</strong> <strong>Prediction</strong><br />

More recently a methodology, known as value prediction,<br />

that predicts run-time outcome values <strong>of</strong> value generating<br />

instructions before they are actually executed, was<br />

suggested to enable successive data dependent instructions<br />

also to be speculatively executed [3], [4]. To understand<br />

how value prediction helps in speculative execution, refer<br />

to Fig. 1 showing a pipeline with value prediction (VP)<br />

and Fig. 2 showing the flow <strong>of</strong> a dependent chain <strong>of</strong> instructions<br />

in a base superscalar processor and a superscalar with<br />

value prediction [5]. Consider a chain <strong>of</strong> dependent instructions<br />

I, J, and K (K dependent on J, J dependent on I).<br />

As shown, the base superscalar machine needs 6 cycles to<br />

execute the three dependent instructions whereas a superscalar<br />

machine with value prediction can potentially finish<br />

executing the chain in 4 cycles by predicting the outputs <strong>of</strong><br />

I and J (alternatively, the inputs <strong>of</strong> J and K) and executing<br />

them speculatively. Although it is easy to understand the<br />

benefit <strong>of</strong> value prediction in eliminating data dependences<br />

by predicting the results in this manner before actual ex-<br />

1


ecution <strong>of</strong> the instructions takes place, it can also have<br />

an impact on branch prediction [5]. Using value prediction,<br />

a branch misprediction can be detected earlier in the<br />

pipeline; this way the machine can start executing the correct<br />

path sooner rather than doing wasted work executing<br />

the wrong path.<br />

FETCH<br />

PC<br />

VPT<br />

ACCESS<br />

DECODE<br />

&<br />

RENAME<br />

prediction<br />

ISSUE<br />

if mispredicted<br />

EXECUTE<br />

VERIFY<br />

COMMIT<br />

Fig. 1. Schematic <strong>of</strong> a 5-stage pipeline with value prediction<br />

that supports speculative execution.<br />

Pipeline<br />

Base Superscalar<br />

Stage 1 2 3 4 5 6<br />

Fetch I,J,K<br />

Decode I,J,K<br />

Execute I J K<br />

Commit I J K<br />

Pipeline<br />

Superscalar with VP<br />

Stage 1 2 3 4<br />

Fetch<br />

I,J,K<br />

Decode<br />

I,J,K<br />

Execute<br />

I,J,K<br />

Commit<br />

I,J,K<br />

Fig. 2. Flow diagrams for a 5-stage pipeline in (i) base superscalar<br />

machine and (ii) base superscalar with value prediction.<br />

The base superscalar machine needs 6 cycles to execute<br />

the three dependent instructions whereas a superscalar machine with<br />

value prediction can execute it in 4 cycles.<br />

C. <strong>Value</strong>-based <strong>Branch</strong> <strong>Prediction</strong><br />

By combining the concepts <strong>of</strong> value prediction and<br />

branch prediction, a new class <strong>of</strong> branch prediction schemes<br />

called value-based branch prediction 1 schemes have been<br />

proposed recently [7], [8], [1], [6]. In this class <strong>of</strong> branch<br />

prediction schemes, the branch predictor is aided by some<br />

form <strong>of</strong> data value history <strong>of</strong> the branch register operands<br />

in addition to branch history. Two approaches for improving<br />

branch prediction using value-based approaches have<br />

been described by Heil et al. [8]. These are (i) the speculative<br />

branch execution approach and (ii) the branch prediction<br />

by correlating on data values approach. Fig. 3 shows<br />

schematic diagrams for the two approaches.<br />

• In the speculative branch execution approach, a conventional<br />

data (value) predictor is used to predict input values<br />

for branch instructions. Then the branch is evaluated using<br />

the predicted values. At the same time, a branch prediction<br />

is also obtained using conventional branch predictors.<br />

A chooser or selector is then used to select the final prediction.<br />

1 The term ’value-based branch prediction’ appears to have been<br />

first used by Chen et al. [6]. In this paper, we use it to encompass a<br />

variety <strong>of</strong> schemes that use value history for branch prediction.<br />

GLOBAL<br />

BRANCH<br />

HISTORY<br />

BRANCH PC<br />

DATA<br />

VALUE<br />

HISTORY<br />

CHOOSER<br />

BRANCH<br />

PRED.<br />

VALUE<br />

PRED.<br />

(a)<br />

BRANCH<br />

EXECUTION<br />

GLOBAL<br />

BRANCH<br />

HISTORY<br />

BRANCH PC<br />

DATA<br />

VALUE<br />

HISTORY<br />

(b)<br />

BRANCH<br />

PRED.<br />

Fig. 3. Two approaches to value-based branch prediction<br />

schemes proposed by Heil et al.: (a) speculative branch execution<br />

approach (b) branch prediction by correlating on data values<br />

approach.<br />

• In the second approach, the data value history is directly<br />

fed into the branch predictor. This enables the branch<br />

predictor to correlate on data values similar to the way it<br />

would correlate on global branch history.<br />

C.1 Potential <strong>of</strong> value-based branch prediction<br />

As mentioned earlier, value predictability has been studied<br />

by many authors [3], [4], [9]. Sazeides and Smith have<br />

evaluated the effect <strong>of</strong> value predictability on branch predictability<br />

[10]. Their evaluations show that many (up<br />

to 82%) <strong>of</strong> the branch nodes propagate predictability, i.e.,<br />

when the branch output is predictable, at least one <strong>of</strong> their<br />

inputs is also predictable. Another interesting result presented<br />

in their study is that branch mispredictions are rare<br />

even for branches with both unpredictable inputs and only<br />

about 50% <strong>of</strong> the branches are mispredicted when both inputs<br />

are predictable. Their study concludes that there is<br />

substantial potential for improving branch prediction accuracy<br />

by incorporating data value history information into<br />

existing branch predictors.<br />

C.2 Some pros and cons <strong>of</strong> value-based branch prediction<br />

One <strong>of</strong> the most important benefits <strong>of</strong> value-based branch<br />

prediction is that, as mentioned earlier, a branch misprediction<br />

can be detected earlier in the pipeline and the machine<br />

can start executing the correct path sooner. This<br />

can potentially reduce the amount <strong>of</strong> wasted work due to<br />

mispredictions. <strong>Value</strong>-based predictors are most useful in<br />

situations where other correlating predictors may fail, for<br />

example, while predicting branches at the end <strong>of</strong> large ’for’<br />

loops in programs. In such cases (when the number <strong>of</strong> loop<br />

iterations is greater than the history length <strong>of</strong> the correlating<br />

predictor), value based predictors can correctly predict<br />

the branches since the input <strong>of</strong> such branches will follow a<br />

well-defined stride pattern [1].<br />

However, if not carefully used, value-based branch predictors<br />

may potentially result in more mispredictions<br />

and/or delay branch resolution [5]. In value-based branch<br />

prediction, branches with speculative operands can be handled<br />

in two ways: (1) they are resolved when their operands<br />

are still speculative and (2) they are resolved only when<br />

their operands become non-speculative. The first option<br />

can increase the number <strong>of</strong> branch mispredictions since the<br />

predicted inputs may themselves be wrong. The second option<br />

forces the branch resolution to be postponed until after<br />

its producer instructions have been committed thus in-<br />

2


creasing the latency. Due to these drawbacks, most simple<br />

value-based branch prediction techniques have been used<br />

only in combination with other existing branch predictors<br />

[1]. However advanced value-based branch prediction techniques,<br />

that have been proposed recently using techniques<br />

such as dynamic data dependence tracking, have achieved<br />

higher accuracies when used independently [6].<br />

D. Contributions <strong>of</strong> this Work<br />

Gonzalez et al. have proposed a value-based branch prediction<br />

scheme called BPVP – branch prediction based on<br />

value prediction [1]. Their scheme uses the speculative<br />

branch execution approach described above. They have<br />

shown that value prediction can improve the overall accuracy<br />

<strong>of</strong> branch prediction techniques by correcting predictions<br />

that are mispredicted by classical branch predictors<br />

like history-based or correlating predictors. This approach<br />

identifies the instructions that generate the inputs<br />

for branches, predicts their output values, uses this predicted<br />

inputs to determine the branch outcome, and speculatively<br />

executes the instruction stream past the branch<br />

instruction. The study <strong>of</strong> value-based branch prediction<br />

schemes and the implementation and evaluation <strong>of</strong> the<br />

BPVP scheme is the focus <strong>of</strong> this paper. We focus on implementing<br />

the baseline BPVP scheme and use three different<br />

data value predictors, namely, last value, stride, and twolevel<br />

predictors in the BPVP implementation. We simulate<br />

these configurations and compare the prediction accuracy<br />

with respect to two classical branch predictors, bimodal<br />

and Gshare.<br />

The organization <strong>of</strong> the rest <strong>of</strong> this paper is as follows.<br />

In Sec. II, we present related work in the area <strong>of</strong><br />

branch prediction and value prediction techniques. Then<br />

in Sec. III, we describe the current state-<strong>of</strong>-the-art in value<br />

based branch prediction. Next in Sec. IV, we present details<br />

<strong>of</strong> our simulation environment and discuss briefly how<br />

we implemented the BPVP branch predictor. Then, we<br />

present simulation results and discussions in Sec. V and<br />

finally, we conclude in Sec. VI.<br />

II. Related Work<br />

In this section, we review the design <strong>of</strong> some popular<br />

branch prediction and value prediction schemes. Later in<br />

Sec. V, we will compare the performance <strong>of</strong> these schemes<br />

with the BPVP scheme.<br />

A. <strong>Branch</strong> <strong>Prediction</strong> Schemes<br />

We briefly review some popular branch prediction<br />

schemes below. For the interested reader, a survey <strong>of</strong><br />

branch prediction strategies can be found in [11].<br />

A.1 Static <strong>Branch</strong> <strong>Prediction</strong> Schemes<br />

Most branches exhibit a high degree <strong>of</strong> correlation with<br />

their past behavior and that <strong>of</strong> other branches in the vicinity.<br />

This correlation can be used to predict the outcome <strong>of</strong><br />

a branch in the decode stage to a high degree <strong>of</strong> accuracy<br />

(low miss prediction rate). This significantly reduces the<br />

overhead associated with branches. There are two methods<br />

one can use to statistically predict branches: (1) by examining<br />

the program behavior (branch direction, etc.) or (2)<br />

by the use <strong>of</strong> pr<strong>of</strong>ile information collected from earlier runs<br />

<strong>of</strong> the program (branch behaviors are <strong>of</strong>ten bimodally distributed,<br />

i.e., an individual branch is <strong>of</strong>ten highly biased<br />

toward taken or not-taken).<br />

A.2 Dynamic <strong>Branch</strong> <strong>Prediction</strong> Schemes<br />

Unlike static branch prediction schemes, dynamic branch<br />

prediction is implemented in hardware and the prediction<br />

can change if the branches change behavior while the<br />

program is running. Some popular examples <strong>of</strong> dynamic<br />

branch prediction schemes are discussed below.<br />

• One-bit Scheme: The simplest scheme is the branch prediction<br />

buffer (BPB) or branch history table (BHT). A<br />

branch prediction buffer (BPB) is a small memory indexed<br />

by the lower portion <strong>of</strong> the address <strong>of</strong> the branch instruction.<br />

The memory contains a bit that indicates whether<br />

the branch was recently taken or not. The downside <strong>of</strong> using<br />

a single bit is that even if a branch is almost always<br />

taken, it will likely be predicted incorrectly twice, rather<br />

than once, when it is not taken.<br />

• Two-bit prediction scheme[11]: The two-bit prediction<br />

scheme is also <strong>of</strong>ten called the bimodal branch predictor.<br />

It is known that most branches are either usually taken or<br />

usually not taken. Bimodal branch prediction takes advantage<br />

<strong>of</strong> this bimodal distribution <strong>of</strong> branch behavior and attempts<br />

to distinguish usually taken from usually not taken<br />

branches. It makes a prediction based on the direction that<br />

the branch went the last few times it was executed. The<br />

bimodal scheme uses a table <strong>of</strong> 2-bit saturating up-down<br />

counters to keep track <strong>of</strong> the direction a branch is more<br />

likely to take. Each branch is mapped via its address to<br />

a counter. The branch is predicted taken if the most significant<br />

bit <strong>of</strong> the associated counter is set; otherwise, it is<br />

predicted not taken. These counters are updated based on<br />

the branch outcomes. When a branch is taken, the 2-bit<br />

value <strong>of</strong> the associated saturating counter is incremented<br />

by one; otherwise, the value is decremented by one.<br />

• Two-level prediction scheme (Gshare)[12]: The two-bit<br />

predictor schemes use only the recent behavior <strong>of</strong> a branch<br />

to predict the future behavior <strong>of</strong> that branch. It may be<br />

possible to improve the prediction accuracy if we also look<br />

at the recent behavior <strong>of</strong> other branches rather than just<br />

the branch we are trying to predict. Hence predictors that<br />

use global branch history are called also correlating predictors.<br />

The Gshare predictor is a variation <strong>of</strong> the twolevel<br />

GAg/GAs global history predictor. Two-level prediction<br />

uses two tables: a pattern history table (PHT) at the<br />

lower level and a table <strong>of</strong> 2-bit predictors just like the bimodal<br />

described above. The idea is to use something other<br />

than the branch PC to index into the table <strong>of</strong> 2-bit predictors.<br />

In the Gshare configuration, the two-level predictor<br />

keeps 1 entry <strong>of</strong> hist-size bits <strong>of</strong> branch history in a global<br />

branch history register (GBHR), but it XORs those bits<br />

with hist-size bits taken from the PC before indexing into<br />

the second-level table to get the prediction. The advantage<br />

<strong>of</strong> using global history is that it can detect and pre-<br />

3


dict sequences <strong>of</strong> correlated branches. Also combining the<br />

stored branch history and branch PC (like the XOR done in<br />

Gshare) provides some degree <strong>of</strong> anti-aliasing and prevents<br />

conflict when indexing into the PHT. A 16K-entry Gshare<br />

predictor in which 12-bits <strong>of</strong> history are XORed with 14<br />

bits <strong>of</strong> branch PC is used in the Sun UltraSPARC-III processor.<br />

B. <strong>Value</strong> <strong>Prediction</strong> Schemes<br />

Data value prediction schemes predict the value <strong>of</strong> the<br />

result register in data-producing instructions based on past<br />

behavior. This is similar in some respects to branch prediction<br />

techniques, where the branch direction and the target<br />

address are predicted for speculative execution. But,<br />

data value prediction differs from branch prediction as it<br />

involves a multi-valued decision (i.e., an 1-out-<strong>of</strong>-2 w prediction,<br />

where w is the word size) as opposed to branch<br />

prediction which is a binary decision (i.e., an 1-out-<strong>of</strong>-2<br />

prediction) [13]. Below we present a review <strong>of</strong> three <strong>of</strong> the<br />

most commonly used data value prediction schemes.<br />

B.2 Stride Predictor<br />

Stride predictor uses data value locality by monitoring<br />

the stride by which consecutive instances <strong>of</strong> an instruction<br />

change. When the results <strong>of</strong> consecutive instances <strong>of</strong> an instruction<br />

change by a constant value, then the result <strong>of</strong> future<br />

instances can be predicted by storing the stride value.<br />

This can be easily observed in the behavior <strong>of</strong> loop induction<br />

variables and in programs stepping through arrays in<br />

a regular fashion. The block diagram <strong>of</strong> a stride-based predictor<br />

is shown in Fig. 5 [13]. It uses a VHT similar to last<br />

value predictor. But each entry in VHT has four fields:<br />

tag, state, value, and stride. The state field is necessary to<br />

detect stride pattern and to update/read the stride value.<br />

The stride predictor can be in one <strong>of</strong> 3 different states: init,<br />

transient, and steady.<br />

Fig. 5.<br />

Data value prediction schemes: Stride predictor<br />

Fig. 4.<br />

Data value prediction schemes: Last value predictor<br />

B.1 Last <strong>Value</strong> Predictor<br />

The simplest among data value prediction schemes is the<br />

last value predictor. Here, the result produced by an instruction<br />

the last time it was executed is stored and the<br />

same value is predicted when the instruction is executed<br />

again. The hardware structure used for last outcome or last<br />

value predictor is shown in Fig. 4 [13]. The basic idea is to<br />

store the output <strong>of</strong> all register write instructions. Since it<br />

is impossible, considering the hardware overhead required<br />

to store the outcome <strong>of</strong> all instances <strong>of</strong> every register write<br />

instruction, a hash function is used to do address mapping.<br />

The predictor uses a value history table (VHT) that<br />

stores the last value produced by the instructions mapped<br />

to it. Each VHT entry has two fields named tag and value.<br />

The tag field stores the identity <strong>of</strong> the instruction while the<br />

value field stores the last result <strong>of</strong> the instruction. Replacement<br />

policies similar to those used in caches such as least<br />

recently used (LRU), first-in-first-out (FIFO) are used to<br />

replace instructions in case <strong>of</strong> a tag mismatch.<br />

When an instruction results in VHT miss, the instruction<br />

is written into VHT with the state field set to ’init’<br />

and the result put in value field. When another instance<br />

<strong>of</strong> the same instruction occurs and if the state field is set<br />

to ’init’, no prediction is made. The stride S1 is calculated<br />

from the result (D1) <strong>of</strong> the instruction as S1=(D1-<strong>Value</strong><br />

in VHT entry). D1 and S1 are entered into the value and<br />

stride fields <strong>of</strong> the VHT entry and the state is set to ’transient’.<br />

While in ’transient’ state, if another instance <strong>of</strong> the<br />

instruction occurs, no prediction is made. If that instance<br />

<strong>of</strong> the instruction produces a value D2 then, (1) a new<br />

stride value is calculated as S2=D2-value in VHT entry,<br />

(2) D2 is entered in the value field <strong>of</strong> the VHT entry and<br />

(3) if S2 is same as S1, the state field is set to steady and<br />

the value field is updated to D2. If S2 is different from S1,<br />

state remains in transient while stride and data fields are<br />

updated to S2 and D2.<br />

B.3 Two-Level <strong>Value</strong> Predictor<br />

The 2-level prediction described in [13] is based on the<br />

observation that a substantial percentage <strong>of</strong> dynamic instructions<br />

have 4 or fewer unique values in their most recent<br />

history. By storing the last 4 unique outcomes <strong>of</strong><br />

instructions and by using a 2-level predictor that performs<br />

4


a 1-out-<strong>of</strong>-4 predictions the behavior patterns <strong>of</strong> most instructions<br />

can be predicted. The block diagram <strong>of</strong> one such<br />

2-level predictor is shown in Fig. 6 [13].<br />

static compiler based and dynamic run-time behavior based<br />

branch prediction. Static compiler based techniques have<br />

low cost as branch state information is not required for<br />

prediction, eliminating costly hardware. Dynamic branch<br />

predictors have higher accuracy as they make use <strong>of</strong> current<br />

branch history to make branch prediction. The CS branch<br />

predictor makes use <strong>of</strong> contents in the registers, part <strong>of</strong><br />

branch instruction to make a prediction. However the prediction<br />

function is defined and realized through the insertion<br />

<strong>of</strong> additional instructions by the compiler. Two major<br />

advantages result from this scheme (1) Using the compiler<br />

to define the predictor function, increases the flexibility <strong>of</strong><br />

realizing almost any predictor function without the number<br />

<strong>of</strong> predictors being limited by the hardware. (2) Reduced<br />

hardware overhead is required as the compiler uses architecturally<br />

visible registers and additional instructions to<br />

predict the branches.<br />

The CS prediction algorithm has been described using<br />

the PlayDoh branch architecture [15]. In the PlayDoh architecture<br />

a branch instruction is represented as shown in<br />

Fig. 7.<br />

Fig. 6. Data value prediction schemes: Two-level data value<br />

predictor.<br />

blt r2, r3, L1<br />

pbra b2, L1, 1<br />

cmpp_w_lt_un_un p2,−, r2, r3<br />

brct<br />

b2, p2<br />

The VHT in a 2-level predictor has four fields: tag, LRU,<br />

data value, and value history pattern. The data value field<br />

stores up to 4 most recent outcomes <strong>of</strong> an instruction. If<br />

the different instances <strong>of</strong> an instruction produce one <strong>of</strong> the<br />

4 data values stored, the outcome can be predicted from the<br />

values stored. When the outcome <strong>of</strong> an instance is different<br />

from the 4 data values stored, the fifth value is written<br />

into the data value field. The LRU field keeps track <strong>of</strong><br />

the order in which the 4 data values were seen which helps<br />

to replace the least recently used field. The value history<br />

pattern contains a 2p-bit pattern which stores the last ’p’<br />

outcomes <strong>of</strong> the instruction. The 2p-bit pattern is used to<br />

index into a pattern history table (PHT) which contains 4<br />

independent counters C0,C1,C2,C3 corresponding to 4 different<br />

values stored in the data values field. The VHT is<br />

indexed with the address <strong>of</strong> an instruction to make a prediction.<br />

When it results in a hit, the history pattern value<br />

is used to select the appropriate entry from the PHT. The<br />

PHT entry contains 4 count values from which the maximum<br />

value is selected. The selected value is compared<br />

against a threshold and if the maximum value if greater,<br />

the outcome corresponding to the counter value is selected.<br />

When the maximum value selected is less than the threshold<br />

no prediction is made.<br />

III. <strong>Value</strong>-based <strong>Branch</strong> <strong>Prediction</strong> Schemes<br />

In this section, we describe value-based branch prediction<br />

schemes that have been proposed in literature.<br />

A. Compiler synthesized branch prediction<br />

The compiler synthesized (CS) branch prediction scheme<br />

described by Mahlke et al. [14] combines the strengths <strong>of</strong><br />

(a) conventional branch<br />

(b)PlayDoh equivalent<br />

Fig. 7. A branch instruction in the PlayDoh architecture<br />

used for compiler synthesized branch prediction.<br />

• A prepare-to-branch instruction (pbra) specifies the target<br />

address in advance <strong>of</strong> the branch point to enable<br />

prefetching from the target address. The pbra instruction<br />

is modified from the base instruction shown in Fig. 7 to<br />

handle the CS scheme. The 1-bit literal field is generalized<br />

to be a predicate register operand. The pbra instruction<br />

reads the predicate register to obtain the prediction value<br />

written by previous instructions.<br />

• Computation <strong>of</strong> the branch condition is done by a<br />

compare-to-predicate instruction and stored in a predicate<br />

register. This instruction does not exist for unconditional<br />

branches.<br />

• The branch-on-condition-true (brct) performs the actual<br />

redirection <strong>of</strong> control flow in the PlayDoh architecture. The<br />

architecture provides other special type <strong>of</strong> branches to support<br />

loop execution.<br />

The prediction function used in the branch prediction<br />

algorithm correlates the value <strong>of</strong> the architectural registers<br />

with the direction <strong>of</strong> the branch. The correlation between<br />

the register contents and the branch prediction are<br />

obtained by pr<strong>of</strong>iling target programs using a three step<br />

process. (1) Precompile – Instrumentation code is inserted<br />

into code without branch prediction instructions to collect<br />

branch direction information along with the architectural<br />

register contents, (2) Pr<strong>of</strong>ile – The compiled code with instrumentation<br />

instructions is run on sample inputs to dump<br />

branch pr<strong>of</strong>ile information as well as register dump information,<br />

and (3) Recompile – By analyzing the correlation<br />

5


etween the register dump information and the branch direction<br />

appropriate branch predictors are constructed for<br />

each branch. The instructions corresponding to the predictors<br />

are added back to the code and recompiled to produce<br />

the final code.<br />

The prediction algorithm constructs the predictor based<br />

on the branch register values at a predetermined number<br />

<strong>of</strong> cycles prior to the branch. For each register r i and register<br />

pair r i,j a score S i and S i,j is assigned. The score<br />

reflects the number <strong>of</strong> times branches were taken with the<br />

corresponding register values. During run time based on<br />

the score assigned to current register values in the branch<br />

instruction, a prediction is made. The compiler places instructions<br />

approximately 16 cycles ahead <strong>of</strong> the branch to<br />

make the prediction. This is because the pbra instruction<br />

must be issued at least 12 cycles before the actual branch<br />

instruction.<br />

B. The anticipation mechanism<br />

In this scheme, Farcy et al. propose to dynamically duplicate<br />

the instructions in the dataflow tree <strong>of</strong> a conditional<br />

branch instruction (called branch flow) ahead <strong>of</strong> normal execution<br />

and compute the branch earlier using value prediction<br />

[7]. Thus, they implement an anticipation mechanism<br />

that resolves branches ahead <strong>of</strong> time. Their motivation is<br />

using value history to reduce branch misprediction latency<br />

and their mechanism does not improve the existing branch<br />

predictor accuracy. An overview <strong>of</strong> how the anticipation<br />

mechanism works is shown in Fig. 8. As shown, a separate<br />

branch window, similar in function to the instruction window<br />

is used to process copies <strong>of</strong> tagged branch instructions.<br />

The tagging can be done statically (all instructions in the<br />

neighborhood <strong>of</strong> a branch) or dynamically (starting with a<br />

conditional branch, instructions that produce its operands<br />

are tagged and so on).<br />

C. BDP – <strong>Branch</strong> difference predictor<br />

The branch difference predictor (BDP) proposed by Heil<br />

et al. stores a value history <strong>of</strong> the difference <strong>of</strong> the two<br />

branch source register operands instead <strong>of</strong> the operands<br />

themselves [8]. This is motivated by the fact that conditional<br />

branch instructions normally subtract the values<br />

stored in their register operands and take action depending<br />

on the value and/or sign <strong>of</strong> the result. The schematic <strong>of</strong> the<br />

BDP is shown in Fig. 9. The value history table (VHT)<br />

shown in the figure stores the difference information for<br />

each static conditional branch instruction and is indexed<br />

using the branch PC. Since it is impossible to store the<br />

difference history for all branches, the VHT table is only<br />

used as a selector to choose between the predictions made<br />

by (1) the rare-event predictor (REP) and (2) the backing<br />

predictor. The former is a value-history based cache used<br />

for hard-to-predict branches and the latter, which is simply<br />

a non-value history-based predictor (like bimodal, gshare<br />

etc), is used for predicting the other branches.<br />

The BDP works as follows. When the VHT is being<br />

accessed, the REP is also accessed in parallel with global<br />

branch history and the branch PC. The VHT returns the<br />

read anticipation bit<br />

Instruction window<br />

lda r1, 4(r1)<br />

.....<br />

bne r5, label<br />

cmpult r0, r1, r5<br />

.....<br />

.....<br />

lda r0, 1(r0)<br />

.....<br />

.....<br />

lda r1, 4(r1)<br />

.....<br />

.....<br />

Anticipation table<br />

1011<br />

Tagged<br />

instructions<br />

write anticipation bit<br />

<strong>Branch</strong> window<br />

lda r65, 4(r65)<br />

bne r69, label<br />

cmpult r64, r65, r69<br />

lda r64, 1(r64)<br />

lda r65, 4(r65)<br />

<strong>Value</strong> pred. table<br />

0x8<br />

counter last value<br />

Fig. 8. The anticipation mechanism proposed by Farcy et<br />

al. The branch flow is computed ahead <strong>of</strong> the normal program flow<br />

using a separate branch window and a value prediction table. Early<br />

branch resolution in this manner helps reduce misprediction latency<br />

but does not improve the accuracy <strong>of</strong> the branch predictor.<br />

difference value which is compared with the tags stored in<br />

the REP. If the tag check succeeds, the REP provides the<br />

final prediction, else the backing predictor provides it. The<br />

REP is updated only when the backing predictor mispredicts<br />

and the backing predictor is updated only when it<br />

provides the prediction. As shown in the figure, the VHT<br />

actually has two tables: branch difference cache (BDC)<br />

that stores the difference for the most recently committed<br />

branch instructions and a branch count table (BCT) that<br />

keeps track <strong>of</strong> the number <strong>of</strong> outstanding instances <strong>of</strong> each<br />

static branch. A corresponding entry in the BCT is incremented<br />

when a branch is fetched and decremented when<br />

it is committed. Also an entry in the BDC is replaced<br />

whenever the latter happens.<br />

PC and Global<br />

<strong>Branch</strong> History<br />

PC<br />

PC<br />

BACKING<br />

PRED.<br />

RARE−<br />

EVENT<br />

PRED.<br />

BRANCH<br />

COUNT<br />

TABLE<br />

BRANCH<br />

DIFF.<br />

CACHE<br />

<strong>Prediction</strong><br />

<strong>Prediction</strong><br />

Tag<br />

Hit<br />

VALUE HIST. TABLE<br />

0<br />

COMPARE<br />

Fig. 9. The branch difference predictor (BDP) proposed by<br />

Heil et al. The value history table (VHT) stores the past history <strong>of</strong><br />

the difference <strong>of</strong> the two branch source register operands and is used<br />

to select the prediction made by the value history based rare-event<br />

predictor for hard-to-predict branches only.<br />

01<br />

6


D. BPVP – <strong>Branch</strong> prediction based on value prediction<br />

In the BPVP scheme, a mechanism for predicting the<br />

branch outcome using the predicted values <strong>of</strong> the branch<br />

instruction’s registers is used. Fig. 10 gives the schematic<br />

<strong>of</strong> the implementation <strong>of</strong> this scheme. It consists <strong>of</strong> an<br />

input information table (IIT), a value predictor, and a<br />

conditional evaluation unit (CEU). The IIT, which has as<br />

many entries as there are general purpose register (GPRs),<br />

stores the last branch PC that updated that register and<br />

the result <strong>of</strong> the last evaluation in its PC, CMP.VALUE,<br />

and CMP.RESULT fields. The last field is a boolean value<br />

whose value is set if the latest instruction that updated the<br />

register was a compare instruction, and therefore indicates<br />

that the CMP.VALUE field value contains the speculative<br />

result <strong>of</strong> the compare operation. <strong>Based</strong> on the type <strong>of</strong> instruction<br />

encountered, one <strong>of</strong> the following happen.<br />

• For loads and ALU instructions, the entry <strong>of</strong> the IIT is<br />

indexed by the destination register and the stored PC is<br />

updated with the current PC and the valid bit is reset.<br />

• For a branch whose input is produced by a load or ALU<br />

instruction, the hardware keeps track <strong>of</strong> the PC <strong>of</strong> the producer<br />

instruction only and when the next branch <strong>of</strong> that<br />

type is fetched, the PC is used to lookup in the value predictor<br />

and obtain the predicted input which is then used<br />

to evaluate the branch condition and make a prediction.<br />

• For a branch whose input is produced by a compare instruction<br />

or branches that are compare-and-branch (like<br />

BNEZ etc.), both register inputs need to be predicted using<br />

the value predictor and the compare operation is evaluated<br />

in the CEU. Depending on the outcome <strong>of</strong> the compare,<br />

the branch is predicted.<br />

# <strong>of</strong> entries<br />

= # <strong>of</strong> GPRs<br />

Input<br />

Register<br />

PC<br />

Input Information Table (IIT)<br />

CMP.VALUE<br />

CMP. RESULT<br />

VALUE<br />

PREDICTOR<br />

using RAM for incrementally maintaining the data dependence<br />

chains for the set <strong>of</strong> instructions in the processor<br />

pipeline. The number <strong>of</strong> bits in DDT is equal to the number<br />

<strong>of</strong> reorder buffer (ROB) entries times the number <strong>of</strong><br />

physical registers.<br />

When a branch instruction is executed, if all the branch<br />

register values are available the outcome <strong>of</strong> the branch instruction<br />

can be accurately predicted. But this rarely happens.<br />

However if register values along the dependence chain<br />

are available, then the predictor can use these values to index<br />

into a table to predict the outcomes. ARVI uses the<br />

data dependence register set to calculate a signature. It<br />

also uses a hash <strong>of</strong> the register identifiers and the PC as an<br />

index into a table. To distinguish between different occurrences<br />

<strong>of</strong> the same path values in the register set are hashed<br />

together and used as a tag. To separate different iterations<br />

<strong>of</strong> the loop in an efficient way, ARVI records as part <strong>of</strong><br />

the tag the maximum number <strong>of</strong> instructions spanned by<br />

the dependence chain. The different steps to generate a<br />

prediction in ARVI is listed below. (1) Extract the data<br />

dependence chain for the branch instruction read, from the<br />

DDT. (2) The data dependence chain vector is fed to a<br />

filter called register set extractor (RSE) which forms the<br />

active register set for the current branch. Using the PC <strong>of</strong><br />

the branch instruction and the values in the register set,<br />

and index into branch value information table (BVIT) is<br />

generated. (3) The BVIT has information regarding tags<br />

and prior branch occurrences. The BVIT read returns two<br />

tags, one based on the sum <strong>of</strong> register identifiers, and a second<br />

tag based on the length <strong>of</strong> the data dependence chain<br />

and a performance counter to help in set replacement and<br />

prediction. If the tags read from BVIT match the tags calculated<br />

then the prediction is used. During prediction if all<br />

register values in the dependence chain are available, then<br />

the prediction is precise. Fig. 11 shows ARVI predictor in<br />

a datapath with a 20-stage pipeline [16].<br />

CONDITIONAL<br />

EVALUATION UNIT<br />

Predicted<br />

<strong>Branch</strong> outcome<br />

Fig. 10. The branch predictor based on value prediction<br />

(BPVP) proposed by Gonzalez et al.<br />

E. ARVI – Available register value information predictor<br />

The ARVI scheme [16] makes use <strong>of</strong> the register values <strong>of</strong><br />

the instructions leading up to the branch instruction in addition<br />

to the branch address to make a prediction. When<br />

all the register values involved in branch resolution have<br />

identical values similar to prior occurrence then the outcome<br />

will be the same. The idea behind ARVI scheme is<br />

to determine the essential values in the data dependence<br />

chain leading up to the branch instruction. A data dependence<br />

chain which shows ordering relationship for each<br />

instruction in the pipeline is constructed using a data dependence<br />

table (DDT). DDT is implemented in hardware<br />

Fig. 11. The available register value information (ARVI)<br />

branch predictor scheme proposed by Chen et al.<br />

F. Comparison<br />

Fig. 12 provides a summary <strong>of</strong> the performance that<br />

has been reported for the value-based branch prediction<br />

schemes described above. It can be seen that a hybrid predictor<br />

using both BPVP and Gshare predictor can result<br />

in an accuracy <strong>of</strong> up to 96%.<br />

7


Scheme<br />

Compiler synthesized branch prediction<br />

[14]<br />

<strong>Branch</strong> difference predictor (BDP)[8]<br />

<strong>Branch</strong> predictor using value prediction<br />

(BPVP) [1]<br />

Available register value information<br />

(ARVI) predictor [6]<br />

Reported Result (Misprediction rate or IPC)<br />

CS-Practical: Misprediction rate = 7.809%<br />

CS-Theoretical (Unlimited hardware): Misprediction rate = 7.315%<br />

BDP+gshare: Misprediction rate = approx. 8.6% for 16KB, 7.6% for 32KB,<br />

and 7.25% for 64KB predictor.<br />

BDP+bi-mode: Misprediction rate = approx. 8.3% for 16KB, 7.4% for<br />

32KB, and 7.25% for 64KB predictor.<br />

BPVP only: Misprediction rate = approx. 20%<br />

BPVP+gshare: Misprediction rate = 5.11% for 16KB, 4.61% for 32KB,<br />

and 4.23% for 64KB<br />

ARVI current value: <strong>Prediction</strong> rate = approx. 93.75% for 20-stage<br />

pipeline, 93.38% for 40-stage, and 93.38% for 60-stage pipeline.<br />

ARVI perfect value: <strong>Prediction</strong> rates = approx. 96.25% for 20-stage,<br />

97.50% for 40-stage, and 97.38% for 60-stage pipeline.<br />

Fig. 12. Comparison <strong>of</strong> misprediction rates and/or performance improvement due to value-based branch prediction schemes<br />

based on reported results for various predictor sizes.<br />

IV. Implementation Details<br />

In this section, first, we describe our simulation environment<br />

and benchmarks. Next, we provide an overview<br />

<strong>of</strong> the changes we made to incorporate the BPVP branch<br />

predictor in our simulator.<br />

A. Simulation Environment and Benchmarks<br />

We simulated branch prediction schemes using the SimpleScalar/PISA<br />

simulator platform [17]. SimpleScalar is<br />

a processor simulator with an instruction set architecture<br />

(ISA) called portable instruction set architecture (PISA)<br />

which is loosely based on the MIPS ISA. Our default configuration,<br />

implemented in the form <strong>of</strong> a functional simulator<br />

called sim-bpred, simulates a 5-stage dynamically<br />

scheduled single issue processor pipeline with branch prediction.<br />

Misprediction rates using different branch prediction<br />

techniques (like static always-taken, bimodal, gshare,<br />

and BPVP) were obtained by running this simulator for<br />

various benchmarks. We used 4 benchmarks: compress95<br />

from the SPEC95 suite and gcc, vpr, parser from the SPEC<br />

CPU2000 suite. We ran the benchmarks using SPECsupplied<br />

input sets for a warmup window <strong>of</strong> 500 Million<br />

instructions and collected our results for the next 50 Million<br />

instructions. By doing this, we obtain results for a representative<br />

sample that is free <strong>of</strong> any compulsory misses and<br />

related errors in the branch and/or value prediction caches.<br />

B. Modifications to the simulator<br />

The sim-bpred simulator in SimpleScalar implements the<br />

following branch prediction schemes: static (taken/nottaken),<br />

bimodal, 2lev (two-level branch predictor), and<br />

comb (combination predictor). <strong>Value</strong> prediction extensions<br />

for SimpleScalar that implement stride, last-value, and<br />

two-level value predictors are available as public-domain<br />

code routines [18]. We used these routines to implement<br />

the BPVP scheme in the sim-bpred simulator. As described<br />

in [1] and in Sec. III-D, we implemented the IIT with as<br />

many entries as the number <strong>of</strong> general purpose integer registers<br />

(32 in SimpleScalar) with each entry containing the<br />

PC <strong>of</strong> the last instruction that wrote into that register.<br />

Our implementation <strong>of</strong> the BPVP scheme is slightly different<br />

from that described in [1]. The differences arise from<br />

the fact that the ISA that we use (SimpleScalar/PISA) is<br />

different from the one that the authors <strong>of</strong> [1] used (Alpha<br />

ISA). The modification that we have included in our implementation<br />

handles branch instructions that have two register<br />

operands which is a feature <strong>of</strong> SimpleScalar/PISA. Also,<br />

we have implemented the BPVP-only scheme, and not<br />

the final hybrid BPVP (BPVP+gshare or BPVP+agree)<br />

scheme that the authors have reported. We also do not implement<br />

the scheme in SimpleScalar’s cycle-accurate simulator<br />

called sim-outorder. Hence, we do not report results<br />

for performance improvement (IPC) due to BPVP.<br />

However, our implementation and evaluation <strong>of</strong> the BPVP<br />

scheme throws light on an issue that the authors <strong>of</strong> [1] have<br />

not studied - the effect <strong>of</strong> different value predictors on accuracy<br />

<strong>of</strong> the BPVP scheme.<br />

V. Simulations and Discussion<br />

Our simulations evaluated the BPVP scheme using three<br />

different data value predictors, namely, last value, stride,<br />

and two-level for our benchmark set. We also compared the<br />

branch prediction accuracies <strong>of</strong> these configurations with a<br />

static (always-taken) prediction scheme, a 32K-entry bimodal<br />

prediction scheme, and a Gshare prediction scheme.<br />

In all our simulations, we maintained the sizes <strong>of</strong> the predictors<br />

approximately the same (32KB). The misprediction<br />

rates we obtained for 4 benchmarks is shown individually<br />

in Fig. 15. The average branch prediction rate across the<br />

benchmark set is shown in Fig. 13.<br />

From Fig. 13, we observe that when the BPVP scheme<br />

is used alone to provide the branch prediction, it does not<br />

result in a very accurate prediction (only about 76% accuracy).<br />

However, we note that, on the average, using<br />

the last value predictor gives a slightly higher accuracy for<br />

8


Direction Hit Rate (%)<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

76.99<br />

Average <strong>Branch</strong> <strong>Prediction</strong> Rate<br />

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare<br />

74.095 74.0925 74.0925<br />

90.13 92.525<br />

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare<br />

<strong>Branch</strong> Predictor<br />

Fig. 13. Average branch prediction rate across all benchmarks<br />

for a 32KB predictor size.<br />

the BPVP scheme in contrast to BPVP-with-stride and<br />

BPVP-with-2lev. This means that for value-based branch<br />

prediction, using a value predictor with simple design like<br />

the last value predictor is sufficient. Also, the misprediction<br />

rate for BPVP that we obtain (about 25%) is slightly<br />

higher than those reported by [1] (20%). This may be due<br />

to the different ISA that we used which could have resulted<br />

in a higher number <strong>of</strong> conditional branches in our<br />

simulation sample. Fig. 14 lists the number <strong>of</strong> total and<br />

conditional branches that were encountered in our simulation<br />

sample. Among all branch predictors we evaluated,<br />

the Gshare predictor is found to provide the most accurate<br />

prediction. Also, among different benchmarks that we<br />

used, we did not notice any deviations from the behavior<br />

described above.<br />

Benchmark # <strong>of</strong> conditional<br />

branches in simulated<br />

sample<br />

compress95 9.8 Million<br />

gcc<br />

9.0 Million<br />

parser<br />

7.5 Million<br />

vpr<br />

4.4 Million<br />

Fig. 14. Number <strong>of</strong> conditional branches encountered during<br />

simulation.<br />

VI. Conclusions and Future Work<br />

In this paper, we implemented and evaluated BPVP –<br />

a branch predictor that uses value prediction and studied<br />

its misprediction rates when different data value predictors<br />

were used in its implementation. Our evaluation found that<br />

changing the type <strong>of</strong> data value predictor does not result<br />

in any marked improvement in the prediction accuracy <strong>of</strong><br />

BPVP although our results do point to the fact that using<br />

a simple data value predictor like the last value predictor<br />

can provide sufficiently good accuracies.<br />

<strong>Branch</strong> prediction using value prediction techniques is a<br />

relatively new area <strong>of</strong> research. Although its effectiveness<br />

in predicting conditional direct branches has been shown,<br />

value prediction methods have not been applied to indirect<br />

branch prediction. Indirect branches such as returns<br />

and indirect jump instructions and their target addresses<br />

may be easier to predict using value-based techniques since<br />

many <strong>of</strong> them are are associated with sub-routine calls and<br />

dynamic shared libraries, virtual functions, case statements<br />

etc. Hence, this area presents good potential for future<br />

work in value-based branch prediction.<br />

References<br />

[1] J. Gonzalez and A. Gonzalez, “Control-Flow Speculation<br />

through <strong>Value</strong> <strong>Prediction</strong>,” IEEE Transactions on Computers,<br />

vol. 50, no. 12, pp. 1362–1376, Dec. 2001.<br />

[2] J.L. Hennessy and D.A. Patterson, Computer Architecture: A<br />

Quantitative Approach, Third edition, Morgan Kaufmann Publishers<br />

Inc., 2003.<br />

[3] Freddy Gabbay, “Speculative Execution based on <strong>Value</strong> <strong>Prediction</strong>,”<br />

Tech. Rep., Technical report #1080, Electrical Engineering<br />

Department, Technion - Israel Institute <strong>of</strong> Technology,<br />

1996.<br />

[4] M.H. Lipasti and J.P. Shen, “Exceeding the Dataflow Limit<br />

via <strong>Value</strong> <strong>Prediction</strong>,” in Proceedings <strong>of</strong> the 29th Annual<br />

IEEE/ACM International Symposium on Microarchitecture,<br />

Nov. 1996, pp. 226–237.<br />

[5] A. Sodani and G. S. Sohi, “Understanding the Differences between<br />

<strong>Value</strong> <strong>Prediction</strong> and Instruction Reuse,” in Proceedings<br />

<strong>of</strong> the 31st Annual IEEE/ACM International Symposium on<br />

Microarchitecture, Nov. 1998, pp. 205–215.<br />

[6] L. Chen, S. Dropsho, and D.H. Albonesi, “Dynamic Data Dependence<br />

Tracking and its Application to <strong>Branch</strong> <strong>Prediction</strong>,”<br />

in Proceedings <strong>of</strong> the Ninth International Symposium on High-<br />

Performance Computer Architecture, Feb. 2003, pp. 65–76.<br />

[7] A. Farcy, O. Temam, R. Espasa, and T. Juan, “Dataflow analysis<br />

<strong>of</strong> branch mispredictions and its application to early resolution<br />

<strong>of</strong> branch outcomes,” in Proceedings <strong>of</strong> the 31st Annual<br />

IEEE/ACM International Symposium on Microarchitecture,<br />

Nov. 1998, pp. 59–68.<br />

[8] T.H. Heil, Z. Smith, and J.E. Smith, “Improving branch predictors<br />

by correlating on data values,” in Proceedings <strong>of</strong> the 32nd<br />

Annual IEEE/ACM International Symposium on Microarchitecture,<br />

Nov. 1999.<br />

[9] Y. Sazeides and J.E. Smith, “The Predictability <strong>of</strong> Data <strong>Value</strong>s,”<br />

in Proceedings <strong>of</strong> the Annual International Symposium on<br />

Microarchitecture, Nov. 1997.<br />

[10] Y. Sazeides and J.E. Smith, “Modeling program predictability,”<br />

in Proceedings <strong>of</strong> the Annual International Symposium on<br />

Computer Architecture (ISCA), June 1998, pp. 73–84.<br />

[11] J.E. Smith, “A <strong>Study</strong> <strong>of</strong> <strong>Branch</strong> <strong>Prediction</strong> Strategies,” in<br />

Proceedings <strong>of</strong> the Eighth Annual International Symposium on<br />

Computer Architecture (ISCA), May 1981, pp. 135–148.<br />

[12] S. McFarling, “Combining <strong>Branch</strong> Predictors,” Tech. Rep., DEC<br />

WRL, June 1993, Technical Report TN-36.<br />

[13] K. Wang and M. Franklin, “Highly accurate data value prediction<br />

using hybrid predictors,” in Proceedings <strong>of</strong> the Annual<br />

International Symposium on Microarchitecture, 1997, pp. 281–<br />

290.<br />

[14] S. Mahlke and B. Natarajan, “Compiler Synthesized Dynamic<br />

<strong>Branch</strong> <strong>Prediction</strong>,” in Proceedings <strong>of</strong> the Annual International<br />

Symposium on Microarchitecture, Nov. 1996.<br />

[15] V. Kathail, M.S. Schlansker, and B.R. Rau, “Hpl playdoh architecture<br />

specification: Version 1.0,” Tech. Rep., Hewlett-Packard<br />

Laboratories, Palo Alto, CA, Feb. 1994, Technical Report HPL-<br />

93-80.<br />

[16] L. Chen, S. Dropsho, and D.H. Albonesi, “Dynamic data dependence<br />

tracking and its application to branch prediction,” in Proceedings<br />

<strong>of</strong> the International Symposium on High Performance<br />

Computer Architecture, Feb. 2003, pp. 65–76.<br />

[17] D. Burger and T.M. Austin, “The SimpleScalar Tool Set, version<br />

2.0,” Computer Architecture News, pp. 13–25, June 1997.<br />

[18] Sang-Jeong Lee, “Data value predictors,” URL:<br />

http://www.simplescalar.com/extensions.html.<br />

9


100<br />

<strong>Branch</strong> <strong>Prediction</strong> Rate for compress95<br />

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare<br />

91.99 91.99<br />

100<br />

90<br />

<strong>Branch</strong> <strong>Prediction</strong> Rate for gcc<br />

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare<br />

93.11<br />

88.64<br />

Direction Hit Rate (%)<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

70.37<br />

66.67 66.67 66.67<br />

Direction Hit Rate (%)<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

70.78 70.68 70.67 70.67<br />

10<br />

10<br />

0<br />

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare<br />

<strong>Branch</strong> Predictor<br />

0<br />

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare<br />

<strong>Branch</strong> Predictor<br />

<strong>Branch</strong> <strong>Prediction</strong> Rate for parser<br />

Brach <strong>Prediction</strong> Rate for vpr<br />

Direction Hit Rate (%)<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare<br />

93.7<br />

90.3<br />

81.1<br />

77.39 77.39 77.39<br />

Direction Hit Rate (%)<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare<br />

93.11<br />

88.64<br />

70.78 70.68 70.67 70.67<br />

10<br />

10<br />

0<br />

0<br />

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare<br />

<strong>Branch</strong> Predictor<br />

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare<br />

<strong>Branch</strong> Predictor<br />

Fig. 15.<br />

<strong>Branch</strong> prediction rates for individual benchmarks. The total predictor size is kept the same (32KB) for all experiments.<br />

10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!