A Study of Value-Based Branch Prediction Techniques

A Study of Value-Based Branch 

Prediction Techniques 

Krishnan Sundaresan, Srivathsan Krishnamohan 

{sundare2, krishn37}@msu.edu 

Abstract— 

In this paper, we implement a value-based branch prediction 

scheme called BPVP – branch prediction based on 

value prediction that was proposed by Gonzalez et al. [1]. 

Value-based branch prediction schemes are those that speculatively 

compute the direction and target of branch instructions 

by predicting the register values on which the branch 

condition is evaluated. We evaluate the BPVP branch 

predictor using different value prediction mechanisms (last 

value, stride, and two-level) and compare its accuracy with 

existing (bimodal and gshare) branch prediction schemes. 

We also present an up-to-date survey on this relatively new 

area of research in branch prediction. 

Keywords— Branch prediction, control dependence, speculative 

execution, value prediction. 

I. Introduction 

IMPROVING the performance of modern pipelined processors 

has two major impediments, namely, control dependences 

and (2) data dependences between instructions. 

These: (1) necessitate stall cycles during which the pipeline 

units are idle and (2) limit the throughput of the pipeline. 

Both these effects reduce the speedup that is theoretically 

achievable by pipelining. In modern processors dynamic 

branch prediction is employed to reduce the number 

of stalls due to control dependences. Another technique, 

called dynamic scheduling, helps to reduce stalls due to 

data dependences by spacing dependent instructions apart 

and issuing them out-of-order. However, reducing stalls 

alone will not keep the pipeline busy and improve performance. 

Many processors also adopt a technique called 

speculation to realize higher performance by increasing the 

amount of instruction-level parallelism (ILP) that they can 

exploit. In speculative execution, the outcome of dependent 

instructions are guessed beforehand and the instruction 

stream is executed as if the guesses were correct. These 

guesses are dynamically obtained using dedicated branch 

prediction hardware for branch instructions or value prediction 

hardware for other (for example, load) instructions. A 

speculative processor also includes hardware that can undo 

the effects of an incorrect execution by forcing the instruction 

to commit i.e., write its results to the register file, only 

after the instruction is no longer speculative [2]. 

K. Sundaresan and S. Krishnamohan are graduate students in the 

Department of Electrical and Computer Engineering, Michigan State 

University, East Lansing, MI. The work described in this paper was 

done as part of the final project for the course ECE/CSE 820: Advanced 

Computer Architecture, Fall 2003 taught by Dr. Anthony 

Wojcik at Michigan State University. 

A. Branch Prediction 

Branch prediction involves predicting the direction of the 

branch (taken or not taken) as well as predicting the target 

address of the branch before the end of the instruction 

decode (ID) stage. This enables the instruction fetch (IF) 

stage to speculatively fetch instructions based on the predicted 

branch direction and target address, thus supplying 

the pipeline continuously with instructions and increasing 

the instruction-level parallelism (ILP). When speculative 

execution is used with branch prediction, the actual direction 

the branch will take is known only later in the execution 

(EX) stage. If this result differs from the prediction 

made earlier, it is necessary to re-fetch the instruction 

stream, starting from the correct branch target. The cost 

of doing this is called the misprediction penalty. Misprediction 

penalties are obviously higher for deep pipelines. 

Thus, for a branch prediction scheme to be effective the 

product of the two terms: (1) number of mispredicted 

branches and (2) the penalty for each mispredicted branch, 

should be small. The effectiveness of branch prediction 

is often measured in terms of the branch prediction accuracy 

which is defined as the number of successful branch 

predictions performed by the branch predictor out of the 

overall number of prediction attempts [3]. Thus, research 

in branch prediction has focused on designing bigger, better, 

and/or more complex predictors to get higher branch 

prediction accuracies. 

B. Value Prediction 

More recently a methodology, known as value prediction, 

that predicts run-time outcome values of value generating 

instructions before they are actually executed, was 

suggested to enable successive data dependent instructions 

also to be speculatively executed [3], [4]. To understand 

how value prediction helps in speculative execution, refer 

to Fig. 1 showing a pipeline with value prediction (VP) 

and Fig. 2 showing the flow of a dependent chain of instructions 

in a base superscalar processor and a superscalar with 

value prediction [5]. Consider a chain of dependent instructions 

I, J, and K (K dependent on J, J dependent on I). 

As shown, the base superscalar machine needs 6 cycles to 

execute the three dependent instructions whereas a superscalar 

machine with value prediction can potentially finish 

executing the chain in 4 cycles by predicting the outputs of 

I and J (alternatively, the inputs of J and K) and executing 

them speculatively. Although it is easy to understand the 

benefit of value prediction in eliminating data dependences 

by predicting the results in this manner before actual ex- 

1

ecution of the instructions takes place, it can also have 

an impact on branch prediction [5]. Using value prediction, 

a branch misprediction can be detected earlier in the 

pipeline; this way the machine can start executing the correct 

path sooner rather than doing wasted work executing 

the wrong path. 

FETCH 

PC 

VPT 

ACCESS 

DECODE 

& 

RENAME 

prediction 

ISSUE 

if mispredicted 

EXECUTE 

VERIFY 

COMMIT 

Fig. 1. Schematic of a 5-stage pipeline with value prediction 

that supports speculative execution. 

Pipeline 

Base Superscalar 

Stage 1 2 3 4 5 6 

Fetch I,J,K 

Decode I,J,K 

Execute I J K 

Commit I J K 

Pipeline 

Superscalar with VP 

Stage 1 2 3 4 

Fetch 

I,J,K 

Decode 

I,J,K 

Execute 

I,J,K 

Commit 

I,J,K 

Fig. 2. Flow diagrams for a 5-stage pipeline in (i) base superscalar 

machine and (ii) base superscalar with value prediction. 

The base superscalar machine needs 6 cycles to execute 

the three dependent instructions whereas a superscalar machine with 

value prediction can execute it in 4 cycles. 

C. Value-based Branch Prediction 

By combining the concepts of value prediction and 

branch prediction, a new class of branch prediction schemes 

called value-based branch prediction 1 schemes have been 

proposed recently [7], [8], [1], [6]. In this class of branch 

prediction schemes, the branch predictor is aided by some 

form of data value history of the branch register operands 

in addition to branch history. Two approaches for improving 

branch prediction using value-based approaches have 

been described by Heil et al. [8]. These are (i) the speculative 

branch execution approach and (ii) the branch prediction 

by correlating on data values approach. Fig. 3 shows 

schematic diagrams for the two approaches. 

• In the speculative branch execution approach, a conventional 

data (value) predictor is used to predict input values 

for branch instructions. Then the branch is evaluated using 

the predicted values. At the same time, a branch prediction 

is also obtained using conventional branch predictors. 

A chooser or selector is then used to select the final prediction. 

1 The term ’value-based branch prediction’ appears to have been 

first used by Chen et al. [6]. In this paper, we use it to encompass a 

variety of schemes that use value history for branch prediction. 

GLOBAL 

BRANCH 

HISTORY 

BRANCH PC 

DATA 

VALUE 

HISTORY 

CHOOSER 

BRANCH 

PRED. 

VALUE 

PRED. 

(a) 

BRANCH 

EXECUTION 

GLOBAL 

BRANCH 

HISTORY 

BRANCH PC 

DATA 

VALUE 

HISTORY 

(b) 

BRANCH 

PRED. 

Fig. 3. Two approaches to value-based branch prediction 

schemes proposed by Heil et al.: (a) speculative branch execution 

approach (b) branch prediction by correlating on data values 

approach. 

• In the second approach, the data value history is directly 

fed into the branch predictor. This enables the branch 

predictor to correlate on data values similar to the way it 

would correlate on global branch history. 

C.1 Potential of value-based branch prediction 

As mentioned earlier, value predictability has been studied 

by many authors [3], [4], [9]. Sazeides and Smith have 

evaluated the effect of value predictability on branch predictability 

[10]. Their evaluations show that many (up 

to 82%) of the branch nodes propagate predictability, i.e., 

when the branch output is predictable, at least one of their 

inputs is also predictable. Another interesting result presented 

in their study is that branch mispredictions are rare 

even for branches with both unpredictable inputs and only 

about 50% of the branches are mispredicted when both inputs 

are predictable. Their study concludes that there is 

substantial potential for improving branch prediction accuracy 

by incorporating data value history information into 

existing branch predictors. 

C.2 Some pros and cons of value-based branch prediction 

One of the most important benefits of value-based branch 

prediction is that, as mentioned earlier, a branch misprediction 

can be detected earlier in the pipeline and the machine 

can start executing the correct path sooner. This 

can potentially reduce the amount of wasted work due to 

mispredictions. Value-based predictors are most useful in 

situations where other correlating predictors may fail, for 

example, while predicting branches at the end of large ’for’ 

loops in programs. In such cases (when the number of loop 

iterations is greater than the history length of the correlating 

predictor), value based predictors can correctly predict 

the branches since the input of such branches will follow a 

well-defined stride pattern [1]. 

However, if not carefully used, value-based branch predictors 

may potentially result in more mispredictions 

and/or delay branch resolution [5]. In value-based branch 

prediction, branches with speculative operands can be handled 

in two ways: (1) they are resolved when their operands 

are still speculative and (2) they are resolved only when 

their operands become non-speculative. The first option 

can increase the number of branch mispredictions since the 

predicted inputs may themselves be wrong. The second option 

forces the branch resolution to be postponed until after 

its producer instructions have been committed thus in- 

2

creasing the latency. Due to these drawbacks, most simple 

value-based branch prediction techniques have been used 

only in combination with other existing branch predictors 

[1]. However advanced value-based branch prediction techniques, 

that have been proposed recently using techniques 

such as dynamic data dependence tracking, have achieved 

higher accuracies when used independently [6]. 

D. Contributions of this Work 

Gonzalez et al. have proposed a value-based branch prediction 

scheme called BPVP – branch prediction based on 

value prediction [1]. Their scheme uses the speculative 

branch execution approach described above. They have 

shown that value prediction can improve the overall accuracy 

of branch prediction techniques by correcting predictions 

that are mispredicted by classical branch predictors 

like history-based or correlating predictors. This approach 

identifies the instructions that generate the inputs 

for branches, predicts their output values, uses this predicted 

inputs to determine the branch outcome, and speculatively 

executes the instruction stream past the branch 

instruction. The study of value-based branch prediction 

schemes and the implementation and evaluation of the 

BPVP scheme is the focus of this paper. We focus on implementing 

the baseline BPVP scheme and use three different 

data value predictors, namely, last value, stride, and twolevel 

predictors in the BPVP implementation. We simulate 

these configurations and compare the prediction accuracy 

with respect to two classical branch predictors, bimodal 

and Gshare. 

The organization of the rest of this paper is as follows. 

In Sec. II, we present related work in the area of 

branch prediction and value prediction techniques. Then 

in Sec. III, we describe the current state-of-the-art in value 

based branch prediction. Next in Sec. IV, we present details 

of our simulation environment and discuss briefly how 

we implemented the BPVP branch predictor. Then, we 

present simulation results and discussions in Sec. V and 

finally, we conclude in Sec. VI. 

II. Related Work 

In this section, we review the design of some popular 

branch prediction and value prediction schemes. Later in 

Sec. V, we will compare the performance of these schemes 

with the BPVP scheme. 

A. Branch Prediction Schemes 

We briefly review some popular branch prediction 

schemes below. For the interested reader, a survey of 

branch prediction strategies can be found in [11]. 

A.1 Static Branch Prediction Schemes 

Most branches exhibit a high degree of correlation with 

their past behavior and that of other branches in the vicinity. 

This correlation can be used to predict the outcome of 

a branch in the decode stage to a high degree of accuracy 

(low miss prediction rate). This significantly reduces the 

overhead associated with branches. There are two methods 

one can use to statistically predict branches: (1) by examining 

the program behavior (branch direction, etc.) or (2) 

by the use of profile information collected from earlier runs 

of the program (branch behaviors are often bimodally distributed, 

i.e., an individual branch is often highly biased 

toward taken or not-taken). 

A.2 Dynamic Branch Prediction Schemes 

Unlike static branch prediction schemes, dynamic branch 

prediction is implemented in hardware and the prediction 

can change if the branches change behavior while the 

program is running. Some popular examples of dynamic 

branch prediction schemes are discussed below. 

• One-bit Scheme: The simplest scheme is the branch prediction 

buffer (BPB) or branch history table (BHT). A 

branch prediction buffer (BPB) is a small memory indexed 

by the lower portion of the address of the branch instruction. 

The memory contains a bit that indicates whether 

the branch was recently taken or not. The downside of using 

a single bit is that even if a branch is almost always 

taken, it will likely be predicted incorrectly twice, rather 

than once, when it is not taken. 

• Two-bit prediction scheme[11]: The two-bit prediction 

scheme is also often called the bimodal branch predictor. 

It is known that most branches are either usually taken or 

usually not taken. Bimodal branch prediction takes advantage 

of this bimodal distribution of branch behavior and attempts 

to distinguish usually taken from usually not taken 

branches. It makes a prediction based on the direction that 

the branch went the last few times it was executed. The 

bimodal scheme uses a table of 2-bit saturating up-down 

counters to keep track of the direction a branch is more 

likely to take. Each branch is mapped via its address to 

a counter. The branch is predicted taken if the most significant 

bit of the associated counter is set; otherwise, it is 

predicted not taken. These counters are updated based on 

the branch outcomes. When a branch is taken, the 2-bit 

value of the associated saturating counter is incremented 

by one; otherwise, the value is decremented by one. 

• Two-level prediction scheme (Gshare)[12]: The two-bit 

predictor schemes use only the recent behavior of a branch 

to predict the future behavior of that branch. It may be 

possible to improve the prediction accuracy if we also look 

at the recent behavior of other branches rather than just 

the branch we are trying to predict. Hence predictors that 

use global branch history are called also correlating predictors. 

The Gshare predictor is a variation of the twolevel 

GAg/GAs global history predictor. Two-level prediction 

uses two tables: a pattern history table (PHT) at the 

lower level and a table of 2-bit predictors just like the bimodal 

described above. The idea is to use something other 

than the branch PC to index into the table of 2-bit predictors. 

In the Gshare configuration, the two-level predictor 

keeps 1 entry of hist-size bits of branch history in a global 

branch history register (GBHR), but it XORs those bits 

with hist-size bits taken from the PC before indexing into 

the second-level table to get the prediction. The advantage 

of using global history is that it can detect and pre- 

3

dict sequences of correlated branches. Also combining the 

stored branch history and branch PC (like the XOR done in 

Gshare) provides some degree of anti-aliasing and prevents 

conflict when indexing into the PHT. A 16K-entry Gshare 

predictor in which 12-bits of history are XORed with 14 

bits of branch PC is used in the Sun UltraSPARC-III processor. 

B. Value Prediction Schemes 

Data value prediction schemes predict the value of the 

result register in data-producing instructions based on past 

behavior. This is similar in some respects to branch prediction 

techniques, where the branch direction and the target 

address are predicted for speculative execution. But, 

data value prediction differs from branch prediction as it 

involves a multi-valued decision (i.e., an 1-out-of-2 w prediction, 

where w is the word size) as opposed to branch 

prediction which is a binary decision (i.e., an 1-out-of-2 

prediction) [13]. Below we present a review of three of the 

most commonly used data value prediction schemes. 

B.2 Stride Predictor 

Stride predictor uses data value locality by monitoring 

the stride by which consecutive instances of an instruction 

change. When the results of consecutive instances of an instruction 

change by a constant value, then the result of future 

instances can be predicted by storing the stride value. 

This can be easily observed in the behavior of loop induction 

variables and in programs stepping through arrays in 

a regular fashion. The block diagram of a stride-based predictor 

is shown in Fig. 5 [13]. It uses a VHT similar to last 

value predictor. But each entry in VHT has four fields: 

tag, state, value, and stride. The state field is necessary to 

detect stride pattern and to update/read the stride value. 

The stride predictor can be in one of 3 different states: init, 

transient, and steady. 

Fig. 5. 

Data value prediction schemes: Stride predictor 

Fig. 4. 

Data value prediction schemes: Last value predictor 

B.1 Last Value Predictor 

The simplest among data value prediction schemes is the 

last value predictor. Here, the result produced by an instruction 

the last time it was executed is stored and the 

same value is predicted when the instruction is executed 

again. The hardware structure used for last outcome or last 

value predictor is shown in Fig. 4 [13]. The basic idea is to 

store the output of all register write instructions. Since it 

is impossible, considering the hardware overhead required 

to store the outcome of all instances of every register write 

instruction, a hash function is used to do address mapping. 

The predictor uses a value history table (VHT) that 

stores the last value produced by the instructions mapped 

to it. Each VHT entry has two fields named tag and value. 

The tag field stores the identity of the instruction while the 

value field stores the last result of the instruction. Replacement 

policies similar to those used in caches such as least 

recently used (LRU), first-in-first-out (FIFO) are used to 

replace instructions in case of a tag mismatch. 

When an instruction results in VHT miss, the instruction 

is written into VHT with the state field set to ’init’ 

and the result put in value field. When another instance 

of the same instruction occurs and if the state field is set 

to ’init’, no prediction is made. The stride S1 is calculated 

from the result (D1) of the instruction as S1=(D1-Value 

in VHT entry). D1 and S1 are entered into the value and 

stride fields of the VHT entry and the state is set to ’transient’. 

While in ’transient’ state, if another instance of the 

instruction occurs, no prediction is made. If that instance 

of the instruction produces a value D2 then, (1) a new 

stride value is calculated as S2=D2-value in VHT entry, 

(2) D2 is entered in the value field of the VHT entry and 

(3) if S2 is same as S1, the state field is set to steady and 

the value field is updated to D2. If S2 is different from S1, 

state remains in transient while stride and data fields are 

updated to S2 and D2. 

B.3 Two-Level Value Predictor 

The 2-level prediction described in [13] is based on the 

observation that a substantial percentage of dynamic instructions 

have 4 or fewer unique values in their most recent 

history. By storing the last 4 unique outcomes of 

instructions and by using a 2-level predictor that performs 

4

a 1-out-of-4 predictions the behavior patterns of most instructions 

can be predicted. The block diagram of one such 

2-level predictor is shown in Fig. 6 [13]. 

static compiler based and dynamic run-time behavior based 

branch prediction. Static compiler based techniques have 

low cost as branch state information is not required for 

prediction, eliminating costly hardware. Dynamic branch 

predictors have higher accuracy as they make use of current 

branch history to make branch prediction. The CS branch 

predictor makes use of contents in the registers, part of 

branch instruction to make a prediction. However the prediction 

function is defined and realized through the insertion 

of additional instructions by the compiler. Two major 

advantages result from this scheme (1) Using the compiler 

to define the predictor function, increases the flexibility of 

realizing almost any predictor function without the number 

of predictors being limited by the hardware. (2) Reduced 

hardware overhead is required as the compiler uses architecturally 

visible registers and additional instructions to 

predict the branches. 

The CS prediction algorithm has been described using 

the PlayDoh branch architecture [15]. In the PlayDoh architecture 

a branch instruction is represented as shown in 

Fig. 7. 

Fig. 6. Data value prediction schemes: Two-level data value 

predictor. 

blt r2, r3, L1 

pbra b2, L1, 1 

cmpp_w_lt_un_un p2,−, r2, r3 

brct 

b2, p2 

The VHT in a 2-level predictor has four fields: tag, LRU, 

data value, and value history pattern. The data value field 

stores up to 4 most recent outcomes of an instruction. If 

the different instances of an instruction produce one of the 

4 data values stored, the outcome can be predicted from the 

values stored. When the outcome of an instance is different 

from the 4 data values stored, the fifth value is written 

into the data value field. The LRU field keeps track of 

the order in which the 4 data values were seen which helps 

to replace the least recently used field. The value history 

pattern contains a 2p-bit pattern which stores the last ’p’ 

outcomes of the instruction. The 2p-bit pattern is used to 

index into a pattern history table (PHT) which contains 4 

independent counters C0,C1,C2,C3 corresponding to 4 different 

values stored in the data values field. The VHT is 

indexed with the address of an instruction to make a prediction. 

When it results in a hit, the history pattern value 

is used to select the appropriate entry from the PHT. The 

PHT entry contains 4 count values from which the maximum 

value is selected. The selected value is compared 

against a threshold and if the maximum value if greater, 

the outcome corresponding to the counter value is selected. 

When the maximum value selected is less than the threshold 

no prediction is made. 

III. Value-based Branch Prediction Schemes 

In this section, we describe value-based branch prediction 

schemes that have been proposed in literature. 

A. Compiler synthesized branch prediction 

The compiler synthesized (CS) branch prediction scheme 

described by Mahlke et al. [14] combines the strengths of 

(a) conventional branch 

(b)PlayDoh equivalent 

Fig. 7. A branch instruction in the PlayDoh architecture 

used for compiler synthesized branch prediction. 

• A prepare-to-branch instruction (pbra) specifies the target 

address in advance of the branch point to enable 

prefetching from the target address. The pbra instruction 

is modified from the base instruction shown in Fig. 7 to 

handle the CS scheme. The 1-bit literal field is generalized 

to be a predicate register operand. The pbra instruction 

reads the predicate register to obtain the prediction value 

written by previous instructions. 

• Computation of the branch condition is done by a 

compare-to-predicate instruction and stored in a predicate 

register. This instruction does not exist for unconditional 

branches. 

• The branch-on-condition-true (brct) performs the actual 

redirection of control flow in the PlayDoh architecture. The 

architecture provides other special type of branches to support 

loop execution. 

The prediction function used in the branch prediction 

algorithm correlates the value of the architectural registers 

with the direction of the branch. The correlation between 

the register contents and the branch prediction are 

obtained by profiling target programs using a three step 

process. (1) Precompile – Instrumentation code is inserted 

into code without branch prediction instructions to collect 

branch direction information along with the architectural 

register contents, (2) Profile – The compiled code with instrumentation 

instructions is run on sample inputs to dump 

branch profile information as well as register dump information, 

and (3) Recompile – By analyzing the correlation 

5

etween the register dump information and the branch direction 

appropriate branch predictors are constructed for 

each branch. The instructions corresponding to the predictors 

are added back to the code and recompiled to produce 

the final code. 

The prediction algorithm constructs the predictor based 

on the branch register values at a predetermined number 

of cycles prior to the branch. For each register r i and register 

pair r i,j a score S i and S i,j is assigned. The score 

reflects the number of times branches were taken with the 

corresponding register values. During run time based on 

the score assigned to current register values in the branch 

instruction, a prediction is made. The compiler places instructions 

approximately 16 cycles ahead of the branch to 

make the prediction. This is because the pbra instruction 

must be issued at least 12 cycles before the actual branch 

instruction. 

B. The anticipation mechanism 

In this scheme, Farcy et al. propose to dynamically duplicate 

the instructions in the dataflow tree of a conditional 

branch instruction (called branch flow) ahead of normal execution 

and compute the branch earlier using value prediction 

[7]. Thus, they implement an anticipation mechanism 

that resolves branches ahead of time. Their motivation is 

using value history to reduce branch misprediction latency 

and their mechanism does not improve the existing branch 

predictor accuracy. An overview of how the anticipation 

mechanism works is shown in Fig. 8. As shown, a separate 

branch window, similar in function to the instruction window 

is used to process copies of tagged branch instructions. 

The tagging can be done statically (all instructions in the 

neighborhood of a branch) or dynamically (starting with a 

conditional branch, instructions that produce its operands 

are tagged and so on). 

C. BDP – Branch difference predictor 

The branch difference predictor (BDP) proposed by Heil 

et al. stores a value history of the difference of the two 

branch source register operands instead of the operands 

themselves [8]. This is motivated by the fact that conditional 

branch instructions normally subtract the values 

stored in their register operands and take action depending 

on the value and/or sign of the result. The schematic of the 

BDP is shown in Fig. 9. The value history table (VHT) 

shown in the figure stores the difference information for 

each static conditional branch instruction and is indexed 

using the branch PC. Since it is impossible to store the 

difference history for all branches, the VHT table is only 

used as a selector to choose between the predictions made 

by (1) the rare-event predictor (REP) and (2) the backing 

predictor. The former is a value-history based cache used 

for hard-to-predict branches and the latter, which is simply 

a non-value history-based predictor (like bimodal, gshare 

etc), is used for predicting the other branches. 

The BDP works as follows. When the VHT is being 

accessed, the REP is also accessed in parallel with global 

branch history and the branch PC. The VHT returns the 

read anticipation bit 

Instruction window 

lda r1, 4(r1) 

..... 

bne r5, label 

cmpult r0, r1, r5 

..... 

..... 

lda r0, 1(r0) 

..... 

..... 

lda r1, 4(r1) 

..... 

..... 

Anticipation table 

1011 

Tagged 

instructions 

write anticipation bit 

Branch window 

lda r65, 4(r65) 

bne r69, label 

cmpult r64, r65, r69 

lda r64, 1(r64) 

lda r65, 4(r65) 

Value pred. table 

0x8 

counter last value 

Fig. 8. The anticipation mechanism proposed by Farcy et 

al. The branch flow is computed ahead of the normal program flow 

using a separate branch window and a value prediction table. Early 

branch resolution in this manner helps reduce misprediction latency 

but does not improve the accuracy of the branch predictor. 

difference value which is compared with the tags stored in 

the REP. If the tag check succeeds, the REP provides the 

final prediction, else the backing predictor provides it. The 

REP is updated only when the backing predictor mispredicts 

and the backing predictor is updated only when it 

provides the prediction. As shown in the figure, the VHT 

actually has two tables: branch difference cache (BDC) 

that stores the difference for the most recently committed 

branch instructions and a branch count table (BCT) that 

keeps track of the number of outstanding instances of each 

static branch. A corresponding entry in the BCT is incremented 

when a branch is fetched and decremented when 

it is committed. Also an entry in the BDC is replaced 

whenever the latter happens. 

PC and Global 

Branch History 

PC 

PC 

BACKING 

PRED. 

RARE− 

EVENT 

PRED. 

BRANCH 

COUNT 

TABLE 

BRANCH 

DIFF. 

CACHE 

Prediction 

Prediction 

Tag 

Hit 

VALUE HIST. TABLE 

0 

COMPARE 

Fig. 9. The branch difference predictor (BDP) proposed by 

Heil et al. The value history table (VHT) stores the past history of 

the difference of the two branch source register operands and is used 

to select the prediction made by the value history based rare-event 

predictor for hard-to-predict branches only. 

01 

6

D. BPVP – Branch prediction based on value prediction 

In the BPVP scheme, a mechanism for predicting the 

branch outcome using the predicted values of the branch 

instruction’s registers is used. Fig. 10 gives the schematic 

of the implementation of this scheme. It consists of an 

input information table (IIT), a value predictor, and a 

conditional evaluation unit (CEU). The IIT, which has as 

many entries as there are general purpose register (GPRs), 

stores the last branch PC that updated that register and 

the result of the last evaluation in its PC, CMP.VALUE, 

and CMP.RESULT fields. The last field is a boolean value 

whose value is set if the latest instruction that updated the 

register was a compare instruction, and therefore indicates 

that the CMP.VALUE field value contains the speculative 

result of the compare operation. Based on the type of instruction 

encountered, one of the following happen. 

• For loads and ALU instructions, the entry of the IIT is 

indexed by the destination register and the stored PC is 

updated with the current PC and the valid bit is reset. 

• For a branch whose input is produced by a load or ALU 

instruction, the hardware keeps track of the PC of the producer 

instruction only and when the next branch of that 

type is fetched, the PC is used to lookup in the value predictor 

and obtain the predicted input which is then used 

to evaluate the branch condition and make a prediction. 

• For a branch whose input is produced by a compare instruction 

or branches that are compare-and-branch (like 

BNEZ etc.), both register inputs need to be predicted using 

the value predictor and the compare operation is evaluated 

in the CEU. Depending on the outcome of the compare, 

the branch is predicted. 

# of entries 

= # of GPRs 

Input 

Register 

PC 

Input Information Table (IIT) 

CMP.VALUE 

CMP. RESULT 

VALUE 

PREDICTOR 

using RAM for incrementally maintaining the data dependence 

chains for the set of instructions in the processor 

pipeline. The number of bits in DDT is equal to the number 

of reorder buffer (ROB) entries times the number of 

physical registers. 

When a branch instruction is executed, if all the branch 

register values are available the outcome of the branch instruction 

can be accurately predicted. But this rarely happens. 

However if register values along the dependence chain 

are available, then the predictor can use these values to index 

into a table to predict the outcomes. ARVI uses the 

data dependence register set to calculate a signature. It 

also uses a hash of the register identifiers and the PC as an 

index into a table. To distinguish between different occurrences 

of the same path values in the register set are hashed 

together and used as a tag. To separate different iterations 

of the loop in an efficient way, ARVI records as part of 

the tag the maximum number of instructions spanned by 

the dependence chain. The different steps to generate a 

prediction in ARVI is listed below. (1) Extract the data 

dependence chain for the branch instruction read, from the 

DDT. (2) The data dependence chain vector is fed to a 

filter called register set extractor (RSE) which forms the 

active register set for the current branch. Using the PC of 

the branch instruction and the values in the register set, 

and index into branch value information table (BVIT) is 

generated. (3) The BVIT has information regarding tags 

and prior branch occurrences. The BVIT read returns two 

tags, one based on the sum of register identifiers, and a second 

tag based on the length of the data dependence chain 

and a performance counter to help in set replacement and 

prediction. If the tags read from BVIT match the tags calculated 

then the prediction is used. During prediction if all 

register values in the dependence chain are available, then 

the prediction is precise. Fig. 11 shows ARVI predictor in 

a datapath with a 20-stage pipeline [16]. 

CONDITIONAL 

EVALUATION UNIT 

Predicted 

Branch outcome 

Fig. 10. The branch predictor based on value prediction 

(BPVP) proposed by Gonzalez et al. 

E. ARVI – Available register value information predictor 

The ARVI scheme [16] makes use of the register values of 

the instructions leading up to the branch instruction in addition 

to the branch address to make a prediction. When 

all the register values involved in branch resolution have 

identical values similar to prior occurrence then the outcome 

will be the same. The idea behind ARVI scheme is 

to determine the essential values in the data dependence 

chain leading up to the branch instruction. A data dependence 

chain which shows ordering relationship for each 

instruction in the pipeline is constructed using a data dependence 

table (DDT). DDT is implemented in hardware 

Fig. 11. The available register value information (ARVI) 

branch predictor scheme proposed by Chen et al. 

F. Comparison 

Fig. 12 provides a summary of the performance that 

has been reported for the value-based branch prediction 

schemes described above. It can be seen that a hybrid predictor 

using both BPVP and Gshare predictor can result 

in an accuracy of up to 96%. 

7

Scheme 

Compiler synthesized branch prediction 

[14] 

Branch difference predictor (BDP)[8] 

Branch predictor using value prediction 

(BPVP) [1] 

Available register value information 

(ARVI) predictor [6] 

Reported Result (Misprediction rate or IPC) 

CS-Practical: Misprediction rate = 7.809% 

CS-Theoretical (Unlimited hardware): Misprediction rate = 7.315% 

BDP+gshare: Misprediction rate = approx. 8.6% for 16KB, 7.6% for 32KB, 

and 7.25% for 64KB predictor. 

BDP+bi-mode: Misprediction rate = approx. 8.3% for 16KB, 7.4% for 

32KB, and 7.25% for 64KB predictor. 

BPVP only: Misprediction rate = approx. 20% 

BPVP+gshare: Misprediction rate = 5.11% for 16KB, 4.61% for 32KB, 

and 4.23% for 64KB 

ARVI current value: Prediction rate = approx. 93.75% for 20-stage 

pipeline, 93.38% for 40-stage, and 93.38% for 60-stage pipeline. 

ARVI perfect value: Prediction rates = approx. 96.25% for 20-stage, 

97.50% for 40-stage, and 97.38% for 60-stage pipeline. 

Fig. 12. Comparison of misprediction rates and/or performance improvement due to value-based branch prediction schemes 

based on reported results for various predictor sizes. 

IV. Implementation Details 

In this section, first, we describe our simulation environment 

and benchmarks. Next, we provide an overview 

of the changes we made to incorporate the BPVP branch 

predictor in our simulator. 

A. Simulation Environment and Benchmarks 

We simulated branch prediction schemes using the SimpleScalar/PISA 

simulator platform [17]. SimpleScalar is 

a processor simulator with an instruction set architecture 

(ISA) called portable instruction set architecture (PISA) 

which is loosely based on the MIPS ISA. Our default configuration, 

implemented in the form of a functional simulator 

called sim-bpred, simulates a 5-stage dynamically 

scheduled single issue processor pipeline with branch prediction. 

Misprediction rates using different branch prediction 

techniques (like static always-taken, bimodal, gshare, 

and BPVP) were obtained by running this simulator for 

various benchmarks. We used 4 benchmarks: compress95 

from the SPEC95 suite and gcc, vpr, parser from the SPEC 

CPU2000 suite. We ran the benchmarks using SPECsupplied 

input sets for a warmup window of 500 Million 

instructions and collected our results for the next 50 Million 

instructions. By doing this, we obtain results for a representative 

sample that is free of any compulsory misses and 

related errors in the branch and/or value prediction caches. 

B. Modifications to the simulator 

The sim-bpred simulator in SimpleScalar implements the 

following branch prediction schemes: static (taken/nottaken), 

bimodal, 2lev (two-level branch predictor), and 

comb (combination predictor). Value prediction extensions 

for SimpleScalar that implement stride, last-value, and 

two-level value predictors are available as public-domain 

code routines [18]. We used these routines to implement 

the BPVP scheme in the sim-bpred simulator. As described 

in [1] and in Sec. III-D, we implemented the IIT with as 

many entries as the number of general purpose integer registers 

(32 in SimpleScalar) with each entry containing the 

PC of the last instruction that wrote into that register. 

Our implementation of the BPVP scheme is slightly different 

from that described in [1]. The differences arise from 

the fact that the ISA that we use (SimpleScalar/PISA) is 

different from the one that the authors of [1] used (Alpha 

ISA). The modification that we have included in our implementation 

handles branch instructions that have two register 

operands which is a feature of SimpleScalar/PISA. Also, 

we have implemented the BPVP-only scheme, and not 

the final hybrid BPVP (BPVP+gshare or BPVP+agree) 

scheme that the authors have reported. We also do not implement 

the scheme in SimpleScalar’s cycle-accurate simulator 

called sim-outorder. Hence, we do not report results 

for performance improvement (IPC) due to BPVP. 

However, our implementation and evaluation of the BPVP 

scheme throws light on an issue that the authors of [1] have 

not studied - the effect of different value predictors on accuracy 

of the BPVP scheme. 

V. Simulations and Discussion 

Our simulations evaluated the BPVP scheme using three 

different data value predictors, namely, last value, stride, 

and two-level for our benchmark set. We also compared the 

branch prediction accuracies of these configurations with a 

static (always-taken) prediction scheme, a 32K-entry bimodal 

prediction scheme, and a Gshare prediction scheme. 

In all our simulations, we maintained the sizes of the predictors 

approximately the same (32KB). The misprediction 

rates we obtained for 4 benchmarks is shown individually 

in Fig. 15. The average branch prediction rate across the 

benchmark set is shown in Fig. 13. 

From Fig. 13, we observe that when the BPVP scheme 

is used alone to provide the branch prediction, it does not 

result in a very accurate prediction (only about 76% accuracy). 

However, we note that, on the average, using 

the last value predictor gives a slightly higher accuracy for 

8

Direction Hit Rate (%) 

100 

90 

80 

70 

60 

50 

40 

30 

20 

10 

0 

76.99 

Average Branch Prediction Rate 

Taken BPVP+Last BPVP+Stride BPVP+2lev Bimod Gshare 

74.095 74.0925 74.0925 

90.13 92.525 


Branch Predictor 

Fig. 13. Average branch prediction rate across all benchmarks 

for a 32KB predictor size. 

the BPVP scheme in contrast to BPVP-with-stride and 

BPVP-with-2lev. This means that for value-based branch 

prediction, using a value predictor with simple design like 

the last value predictor is sufficient. Also, the misprediction 

rate for BPVP that we obtain (about 25%) is slightly 

higher than those reported by [1] (20%). This may be due 

to the different ISA that we used which could have resulted 

in a higher number of conditional branches in our 

simulation sample. Fig. 14 lists the number of total and 

conditional branches that were encountered in our simulation 

sample. Among all branch predictors we evaluated, 

the Gshare predictor is found to provide the most accurate 

prediction. Also, among different benchmarks that we 

used, we did not notice any deviations from the behavior 

described above. 

Benchmark # of conditional 

branches in simulated 

sample 

compress95 9.8 Million 

gcc 

9.0 Million 

parser 

7.5 Million 

vpr 

4.4 Million 

Fig. 14. Number of conditional branches encountered during 

simulation. 

VI. Conclusions and Future Work 

In this paper, we implemented and evaluated BPVP – 

a branch predictor that uses value prediction and studied 

its misprediction rates when different data value predictors 

were used in its implementation. Our evaluation found that 

changing the type of data value predictor does not result 

in any marked improvement in the prediction accuracy of 

BPVP although our results do point to the fact that using 

a simple data value predictor like the last value predictor 

can provide sufficiently good accuracies. 

Branch prediction using value prediction techniques is a 

relatively new area of research. Although its effectiveness 

in predicting conditional direct branches has been shown, 

value prediction methods have not been applied to indirect 

branch prediction. Indirect branches such as returns 

and indirect jump instructions and their target addresses 

may be easier to predict using value-based techniques since 

many of them are are associated with sub-routine calls and 

dynamic shared libraries, virtual functions, case statements 

etc. Hence, this area presents good potential for future 

work in value-based branch prediction. 

References 

[1] J. Gonzalez and A. Gonzalez, “Control-Flow Speculation 

through Value Prediction,” IEEE Transactions on Computers, 

vol. 50, no. 12, pp. 1362–1376, Dec. 2001. 

[2] J.L. Hennessy and D.A. Patterson, Computer Architecture: A 

Quantitative Approach, Third edition, Morgan Kaufmann Publishers 

Inc., 2003. 

[3] Freddy Gabbay, “Speculative Execution based on Value Prediction,” 

Tech. Rep., Technical report #1080, Electrical Engineering 

Department, Technion - Israel Institute of Technology, 

1996. 

[4] M.H. Lipasti and J.P. Shen, “Exceeding the Dataflow Limit 

via Value Prediction,” in Proceedings of the 29th Annual 

IEEE/ACM International Symposium on Microarchitecture, 

Nov. 1996, pp. 226–237. 

[5] A. Sodani and G. S. Sohi, “Understanding the Differences between 

Value Prediction and Instruction Reuse,” in Proceedings 

of the 31st Annual IEEE/ACM International Symposium on 

Microarchitecture, Nov. 1998, pp. 205–215. 

[6] L. Chen, S. Dropsho, and D.H. Albonesi, “Dynamic Data Dependence 

Tracking and its Application to Branch Prediction,” 

in Proceedings of the Ninth International Symposium on High- 

Performance Computer Architecture, Feb. 2003, pp. 65–76. 

[7] A. Farcy, O. Temam, R. Espasa, and T. Juan, “Dataflow analysis 

of branch mispredictions and its application to early resolution 

of branch outcomes,” in Proceedings of the 31st Annual 

IEEE/ACM International Symposium on Microarchitecture, 

Nov. 1998, pp. 59–68. 

[8] T.H. Heil, Z. Smith, and J.E. Smith, “Improving branch predictors 

by correlating on data values,” in Proceedings of the 32nd 

Annual IEEE/ACM International Symposium on Microarchitecture, 

Nov. 1999. 

[9] Y. Sazeides and J.E. Smith, “The Predictability of Data Values,” 

in Proceedings of the Annual International Symposium on 

Microarchitecture, Nov. 1997. 

[10] Y. Sazeides and J.E. Smith, “Modeling program predictability,” 

in Proceedings of the Annual International Symposium on 

Computer Architecture (ISCA), June 1998, pp. 73–84. 

[11] J.E. Smith, “A Study of Branch Prediction Strategies,” in 

Proceedings of the Eighth Annual International Symposium on 

Computer Architecture (ISCA), May 1981, pp. 135–148. 

[12] S. McFarling, “Combining Branch Predictors,” Tech. Rep., DEC 

WRL, June 1993, Technical Report TN-36. 

[13] K. Wang and M. Franklin, “Highly accurate data value prediction 

using hybrid predictors,” in Proceedings of the Annual 

International Symposium on Microarchitecture, 1997, pp. 281– 

290. 

[14] S. Mahlke and B. Natarajan, “Compiler Synthesized Dynamic 

Branch Prediction,” in Proceedings of the Annual International 

Symposium on Microarchitecture, Nov. 1996. 

[15] V. Kathail, M.S. Schlansker, and B.R. Rau, “Hpl playdoh architecture 

specification: Version 1.0,” Tech. Rep., Hewlett-Packard 

Laboratories, Palo Alto, CA, Feb. 1994, Technical Report HPL- 

93-80. 

[16] L. Chen, S. Dropsho, and D.H. Albonesi, “Dynamic data dependence 

tracking and its application to branch prediction,” in Proceedings 

of the International Symposium on High Performance 

Computer Architecture, Feb. 2003, pp. 65–76. 

[17] D. Burger and T.M. Austin, “The SimpleScalar Tool Set, version 

2.0,” Computer Architecture News, pp. 13–25, June 1997. 

[18] Sang-Jeong Lee, “Data value predictors,” URL: 

http://www.simplescalar.com/extensions.html. 

9

100 

Branch Prediction Rate for compress95 


91.99 91.99 

100 

90 

Branch Prediction Rate for gcc 


93.11 

88.64 


90 

80 

70 

60 

50 

40 

30 

20 

70.37 

66.67 66.67 66.67 


80 

70 

60 

50 

40 

30 

20 

70.78 70.68 70.67 70.67 

10 

10 

0 



0 



Branch Prediction Rate for parser 

Brach Prediction Rate for vpr 


100 

90 

80 

70 

60 

50 

40 

30 

20 


93.7 

90.3 

81.1 

77.39 77.39 77.39 


100 

90 

80 

70 

60 

50 

40 

30 

20 


93.11 

88.64 

70.78 70.68 70.67 70.67 

10 

10 

0 

0 





Fig. 15. 

Branch prediction rates for individual benchmarks. The total predictor size is kept the same (32KB) for all experiments. 

10

A Study of Value-Based Branch Prediction Techniques

Create successful ePaper yourself

Delete template?

Save as template?