Mse: Hardware Algorithms Computer Arithmetic - microLab

Mse: Hardware Algorithms 

Computer Arithmetic 

Marcel Jacomet 

Bern University of Applied Sciences 

Bfh-Ti HuCE-microLab, Biel/Bienne 

Marcel.Jacomet@bfh.ch 

October 29, 2013 

Contents 

1 Introduction 1 

2 Signed Digit Numbers 3 

2.1 General SD Numbers . . . . . . . . . . . . . . . . 3 

2.2 Binary SD Numbers . . . . . . . . . . . . . . . . 8 

2.3 Canonic SD Numbers . . . . . . . . . . . . . . . 13 

2.4 Encododing/Converting SD Numbers . . . . . . . 15 

3 Fast Addition 18 

3.1 Overview of Adders . . . . . . . . . . . . . . . . 18 

3.2 SD Adder . . . . . . . . . . . . . . . . . . . . . . 19 

4 Multiplication and Division 20 

4.1 Sequential Algorithms . . . . . . . . . . . . . . . 20 

4.2 Sequential Mul . . . . . . . . . . . . . . . . . . . 22 

4.3 Sequential Div . . . . . . . . . . . . . . . . . . . 27 

4.4 Square Root Extraction . . . . . . . . . . . . . . 34

Hardware Algorithms 

4.5 Division Through Mul . . . . . . . . . . . . . . . 36 

4.6 Division by Newton-Raphson . . . . . . . . . . . 39 

5 Elementary Functions 43 

5.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . 43 

5.2 Additive Normalization . . . . . . . . . . . . . . 45 

5.3 Multiplicative Normalization . . . . . . . . . . . 50 

References 51 

Marcel Jacomet ii 2008


c○ Marcel Jacomet, 2009 

All rights reserved. This work may not be translated or copied in 

whole or in part without the written permission by the author, except 

for brief excerpts in connection with reviews or scholarly analysis. 

Use in connection with any form of information storage and retrieval, 

electronic adaptation, computer software is forbidden. 

Marcel Jacomet iii 2008


1 Introduction 

The slides ”Hardware Algorithms: Computer Arithmetic” are 

based on [Kor02]. 

Two’s complement is the most often used number system in 

digital hardware. Nevertheless there exist other, more unconventional 

number systems, like the signed digit number system. 

The siged digit number system differs from the traditional binary 

two’2 complement but can yield significant improvement 

with respect to speed in mathematical operators like adders, 

multipliers or dividers. This speed improvement is basically 

achievedbyeliminatingoroptimizingthedelayconsumingcarry 

chains. A special case of the signed digit (SD) and the classical 

signed digit (CSD) number systems are the ternary number 

systems having the values {−1,0,1}. In a more general view 

signed digit number systems (GSD) can represent values like 

{−(r−1),−(r−2),...,−1,0,1,...(r−2),(r−1)}. In this chapter 

we look into the signed digit number system theory. Some 

additional literature is referenced as well. 

Marcel Jacomet 1 2008


Textbooks 

• Computer Arithmetics, 2nd edition. Israel Koren, 2002 

A. K. Peters, hardcover Isbn 1-56881-160-8, Usd 62 

• many good texts can be found on the web 



2 Signed Digit Numbers 

2.1 General Signed Digit Numbers 

The Signed Digit Number System 

• Signeddigitalnumbersystemareusedtoeliminate carry 

propagation chains in addition and subtraction. 

• SD is useful to develop fast algorithms for mathematical 

operators like multiplication or division. 

• In all classical fixed-radix systems the digits are restricted 

to x i ∈ {0,··· ,r −1} 

• However we might expand the system to the signed digit 

number system: x i ∈ {(r −1),(r −2),··· ,1,0,1,··· ,(r− 

2),(r −1)}, where i equals −i 



The ceiling ⌈x⌉ of a number x is the smalest integer that is 

larger than or equal to x. 

Redundancy in Signed Digital Numbers 

• For r = 10 the digit set is {9,8,··· ,1,0,1,··· ,8,9} and, if 

n = 2 the range is 99 ≤ x ≤ 99, which includes 199 numbers. 

However with 2 digits, each having 19 possibilities, 

there are 19 2 = 361 representations. Thus some numbers 

have more than one representation. 

• The signed digit number system has redundancies. (01) = 

(19) = 1,(03) = (17) = 3,··· 

• A high redundancy is too costly, thus we reduce it by restricting 

the digit set to x i ∈ {a,a−1,··· ,1,0,1,··· ,a 

with ⌈ r−1 

2 ⌉ ≤ a ≤ r −1. 

• For r = 10 we get the range 5 ≤ a ≤ 9 for a, selecting 

a = 5. For n = 2 we can represent 1 by (01), (19) not 

being valid anymore since 9 is illegal, however 5 still has 

two representations. 



Note that the carry bits have been shifted to the left to 

simplify the execution of step two in the algorithm. The first 

carry bit c i and the first intermediate bit u i are calculated as 

follows: c i = 1 as 5+5 ≥ 5 and thus u i = 5+5−10·1 = 0. This 

calculation can be done for all positions simultaneously. The 

second step is to calculate the sum bits s i . Here again, all sum 

bits can be calculated simultaneously. Thus not that the carry 

bits do not propagate. 

Adding with SD numbers 

step 1 Compute an interim⎧sum u i and a carry c i : u i = x i + 

⎨ 1 if(x i +y i ) ≥ a 

y i −rc i where c i = 1 if(x i +y i ) ≤ a 

⎩ 

0 otherwise 

step 2 Calculate the final sum s i = u i +c i−1 

• Let’s do the addition of the two decimal numbers. No 

carry propagation is needed! 

2 5 5 5 5 

+ 2 3 4 5 5 

0 1 1 1 1 c i 

4 2 1 0 0 u i 

5 1 0 1 0 s i 



We want to guarantee that no carry will be generated. Thus 

the variable a has to be defined differently. We need |s i | ≤ a, 

which means |s i | = |u i +c i | ≤ a. Since |c i | =≤ 1 we have to 

satisfy |u i | ≤ a−1 for all input values x i and y i . Thus for the 

largest input values we have u i = 2a−r ≤ a−1, resulting in first 

limit a ≤ r−1 which is always the case. However if x i +y i = a, 

the largest sum where c i is still 1, we get u i = a − r < 0. 

Substituting it into |u i | ≤ a−1 leads to r−a ≤ a−1 and finally 

to the second limit r+1 

2 

≤ a. 

Preventing carry riples in SD numbers 

• Let’s keep the definition r = 10 and a = 5 and calculate 

the sum of the two SD numbers 1244 and 1314 

1 2 4 4 

+ 1 3 1 4 

0 1 1 1 c i 

2 5 5 2 u i 

3 4 6 2 s i 

• As digit 6 does not exist, a carry would occur. 

• It can be shown that s i has to be restricted by a new a: 

⌈ r+1 

2 ⌉ ≤ a ≤ r −1 

• Redefining a = 6 we again do an addition of two decimal 

numbers 1354 and 1314. No carry propagation is needed! 



1 3 5 4 

+ 1 3 1 4 

0 1 1 1 c i 

2 4 4 2 u i 

3 3 5 2 s i 



2.2 Binary Signed Digit Numbers 

Binary SD numbers 

• Digit set for binary SD numbers (r = 2, thus a = 1) is: 

{1,0,1} 

step 1 Compute an interim⎧sum u i and a carry c i : u i = x i + 

⎨ 1 if(x i +y i ) ≥ 1 

y i −2c i where c i = 1 if(x i +y i ) ≤ 1 

⎩ 

0 otherwise 

step 2 Calculate the final sum s i = u i +c i−1 

1 1 1 1 1 

+ 0 0 0 0 1 

1 1 1 1 1 c i 

1 1 1 1 0 u i 

1 0 0 0 0 0 s i 



Carry riples in binary SD numbers 

• Note that the condition ⌈ r+1 

2 

⌉ ≤ a ≤ r − 1 cannot be 

satisfied, thus no guarantee for preventing carries can be 

given. 

• Let’s do the addition of the two binary SD numbers. Non 

existing digit 2 occures if we would ignore the carries: 

0 1 1 1 

+ 1 0 0 1 

1 1 1 1 c i 

1 1 1 0 u i 

1 2 2 2 0 s i 



Preventing carry riples in binary SD numbers 

• Aproblematicsituationoccurswhenx i y i = 01andx i−1 y i−1 

equals either 11 or 01. 

1 1 

+ 0 0 

1 1 c i 

1 1 u i 

1 2 1 s i 

• We can avoid setting u i = 1 in these cases by the new 

combination of c i = 0 and u i = 1: 

1 1 

+ 0 0 

0 1 c i 

1 1 u i 

0 0 1 s i 



Rules to prevent carry riples in binary SD numbers 

• We used the following rules for adding binary SD numbers: 

x i y i 00 01 01 11 11 11 

c i 0 1 1 1 1 0 

u i 0 1 1 0 0 0 

• Evolving the previously described idea, we get the modified 

rules table for adding binary SD numbers: 

x i y i 00 01 01 01 01 11 11 11 

x i−1 y i−1 - neither at least neither at least - - - 

is 1 one is 1 is 1 one is 1 

c i 0 1 0 1 0 1 1 0 

u i 0 1 1 1 1 0 0 0 



Examples for adding binary SD numbers 

• Let’s again do the addition of the two binary numbers. 

Now no carry propagation is obtained! 

0 1 1 1 

+ 1 0 0 1 

0 0 0 1 c i 

1 1 1 0 u i 

0 1 1 0 0 s i 

• The above binary numbers represent the decimal numbers 

−3 10 and 7 10 , resulting in 4 10 after the addition. 



2.3 Canonic Signed Digit Numbers 

Canonical recoding of binary SD numbers 

• Binary SD numbers are particularly useful in fast multiplication 

and division algorithms. 

• Nonzerodigitscorrespondtoanactiveoperation: add/subtract 

• Zero digits correspond to shift-only operations. 

• To reduce power consumption, nonzero digits should be 

reduced as much as possible. 

• The number 7 10 has the following representations: 

8 4 2 1 

0 1 1 1 

1 1 1 1 

1 0 1 1 

1 0 0 1 

1 1 1 1 1 

. 

• Out of these variants, 1001 is the minimal representation. 



Converting SD in CSD numbers 

• TheSDnumberwiththemostoptimalnumberofnon-zero 

digits is called canonic signed digit number (CSD). 

• ThefollowingalgorithmcanbeusedtoproduceCSDcode: 

Starting with the LSB, substitute all 1 sequences equal or 

larger than two, with 10···01 (similar for 1 sequences). 

• Replace a 11 sequence by 01 and replace a 11 sequence by 

01, respectively. 

Ex1: Convert the SD number 01111 2 (15 10 ) in a CSD number: 

10001 

Ex2: Convert the SD number 011011 2 (27 10 ) in a CSD number: 

011011 2 = 11101 SD = 100101 CSD 

Ex3: ConverttheSDnumber011100 2 inaCSDnumber: 010100 CSD 



2.4 Encoding and Converting Signed Digit Numbers 

If we use 2 bits for encoding the SD digits {1,0,1} then can 

select between 4! = 24 differnt encodings. 

Encoding SD numbers 

• 4! = 24 different encodings are possible for SD numbers 

{1,0,1}, 2 of them have practical relevance: 

Encoding 1 Encoding 2 

x x h x l x h x l 

0 0 0 0 0 

1 0 1 0 1 

1 1 0 1 1 

• Encoding 1 satisfies x = x l −x h , Encoding 2 can be viewed 

as 2’s complement. 



Converting from SD numbers (1) 

• Using ”Encoding 1” the conversion from SD to 2’s complement 

can be done by using the equation x = x l − x h 

bit for bit. 

1 0 1 0 1 1 x 

1 0 0 0 1 0 x 

Ex3: 25 l 

10 

- 0 1 0 1 1 0 0 1 1 x h 

0 1 1 0 0 1 2’s compl 



Converting from SD numbers (2) 

• Even simpler conversion to 2’s complement is the following 

algorithm: 

– Start at the LSB and move to MSB position. 

– Replace any 1 by 1 and succeeding 0’s be 1’s 

– Forward the negative sign until a1consumes the negative 

sign and is replaced by a 0. 

– If a second 1 is reached by the negative sign then it 

is replaced by a 0 and the negative sign continues. 

– If a 1 is not reached, then the MSB will be set to 1. 

Ex4: 9 10 0 1 1 0 0 1 c i (neg) 

1 1 1 0 1 1 SD 

0 0 1 0 0 1 z i (2’s compl.) 



3 Fast Addition 

3.1 Overview of Adder Architectures 

The most often used arithmetic operation is the addition. It 

is used either by its own or as part of more complex aritmetic 

operations like multiplication, division, hyperbolic trigonometry 

operations and so on. Thus a really fast add operation is 

essential in hardware algorithms. The most straightforward implementation 

is based on the cascoding of full adder blocks wich 

has the disadvantage to be very slow for large operands due to 

its carry rippling through the result. Numerous adder architectures 

have been found to cope with this limitation. 

Revisiting Adder Architectures 

• Ripple carry adders have the ripple delay problem. 

• Carry-look-ahead adder 

• Carry-skip adder 

• Hierachical adders, Wallace tree adder 

• Carry-select adder 

• Carry-save adder 

• Pipelining of arithmetic operations 



3.2 Signed Digit Adder 

SD Adder 

• Develop a cascadable hardware block for adding SD numbers 

using the ”Encoding 2” representation. 

x hl 

i yi 

hl 

 

 

x hl 

2 yhl 2 

 

 

x hl 

1 yhl 1 

 

 

x hl 

0 yhl 0 

 

0 0 

+ 

· · · 

+ 

+ 

+ 

s h i 

s l i 

s h 2 s l 2 

s h 1 s l 1 

s h 0 s l 0 

• Now no carry chain exists anymore! Thus we have a very 

fast adder, independent of its bit length. 



4 Multiplication and Division 

4.1 Sequential Algorithms for Multiplication, 

Division and others 

The current trend moving from signal processing implemented 

in software technologies towards hardware intensive signal processing 

has uncovered a relative lack of understanding hardware 

signal processing architectures. Many hardware efficient algorithms 

exist, but these are generally not well known due to the 

dominance of software systems over the past quarter century. A 

selected set of elementary algorithms are presented in the subsequent 

sections. 

Sequential Algorithms for Multiplication, Division and 

others 

• Binary sequential multiplication is done similar as in decimal 

numbers. 

• Binary sequential division, dito. 

• SQRT (square root), GCD (greatest common divisor) and 

other operations have simple sequential implementation 

architectures. 

• Sequential algorithms are time consuming but profit from 

low gate count. 





4.2 Sequential Algorithm for Multiplication 

Sequential Multiplication Algorithm 

• Multiplication Z = X · Y, with X = x n−1 x n−2···x 1 x 0 

and Y = y n−1 y n−2···y 1 y 0 , where x n−1 and y n−1 are the 

sign bits. 

• For both operands being positive (i.e. x n−1 = y n−1 = 0) 

we obtain: ⎛ ⎞ 

n−2 

∑ 

Z = ⎝ x j 2 j ⎠·Y 

j=0 

• Maximal number of bits needed for Z is 1 + 2(n − 1) = 

2n−1. 

• With P (j) being the partal product, we can find the iteration: 

P (j+1) = ( P (j) +x j ·Y ) · 2 −1 , with P (0) = 0 for 

j = 0,1,···n−2 

• Doing some calculations ... 

P (n−1) = (P (n−2) +x n−2 ·Y)·2 −1 



Example of Sequential Multiplication Algorithm 

• The previous algorithm is working for positive inputs as 

well as for positive multiplier X and negative multiplicant 

Y. Z = X ·Y, with X = 3 10 and Y = −6 10 

Y 1 0 1 0 −6 10 

X × 0 0 1 1 3 10 

P (0) = 0 0 0 0 0 

x 0 = 1 ⇒ add Y + 1 0 1 0 

1 0 1 0 

shift right 1 1 0 1 0 

x 1 = 1 ⇒ add Y + 1 0 1 0 

1 0 1 1 1 0 

shift right 1 0 1 1 1 0 

x 2 = 0 ⇒ shift only 1 1 0 1 1 1 0 −18 10 



Architecture of Sequential Multiplication Example 

• Compact hardware design using one n bit register, one 

2n−1 bit register and one n−1 bit adder. 

Reg 

s Y 

❄ 

✛ ❥+ 

ShiftReg ✻ 

s P 

✻ 

X 

✲ 



Sequential Multiplication Algorithm for negative Operands 

• Multiplication Z = X ·Y, 

• For both operands being negative (i.e. x n−1 = y n−1 = 1), 

the previous algorithm does not work anymore. 

• The 2’s complement can be written as follows: 

X = −x n−1 ·2 n−1 + ˜X 

n−2 

∑ 

where ˜X = x j ·2 j 

j=0 

• If we would have ignored X being negative, we would have 

received: 

˜Z = ˜X·Y = (X +x n−1 ·2 n−1 )·Y = X ·Y +Y ·x n−1·2 n−1 

• Actually the term X ·Y is needed for the result, thus we 

get: 

X ·Y = ˜Z −Y ·x n−1 ·2 n−1 

• This means we must subtract Y from Z in case x n−1 = 1. 



Example of Multiplication with Two Negative Operands 

• For two negative operands the previously developed algorithm 

adaption is needed. 

Ex 6: Z = X ·Y, with X = −3 10 and Y = −6 10 

Y 1 0 1 0 −6 10 

X × 1 1 0 1 −3 10 

x 0 = 1 ⇒ add Y + 1 0 1 0 

shift right 1 1 0 1 0 

x 1 = 0 ⇒ shift only 1 1 1 0 1 0 

x 2 = 1 ⇒ add Y + 1 0 1 0 

1 0 0 0 1 0 

shift right 1 1 0 0 0 1 0 

x 3 = 1 ⇒ correct + 0 1 1 0 

? 0 0 1 0 0 1 0 18 10 



4.3 Sequential Algorithm for Division 

Sequential Restoring Division Algorithm 

• Division Y = X/D is the most complex of the four basic 

arithmetic operations. 

• Given dividend X, divisor D, quotient Q and reminder R 

we get: 

X = Q·D +R with R < D 

• We assume that Y,X,D and Q are fractions, thus X < D, 

and D ≠ 0. 

• Assuming positive operands, we have fractional variables 

like: 

Q = 0.q 1···q m where m = n−1 

• We perform the division as a sequence of subtractions and 

shifts. 

r i = 2r i−1 −q i ·D with r 0 = X 

• Iftheremainder r i is larger than thedivisor D thenq i = 1, 

else q i = 0. 



Example of Sequential Restoring Division 

• The previous algorithm is working for positive inputs, for 

fractions as well as for integers as can be shown. 

Q = X/D, with X = 5/8 10 and D = 3/4 10 = 0.11 2 

r 0 = X 0 .1 0 1 0 0 0 2r 0 ≥ D 

2r 0 0 1 .0 1 0 0 0 set q 1 = 1 

add −D + 1 1 .0 1 0 

r 1 = 2r 0 −D 0 0 .1 0 0 0 0 2r 1 ≥ D 

2r 1 0 1 .0 0 0 0 set q 2 = 1 

add −D + 1 1 .0 1 0 

r 2 = 2r 1 −D 0 0 .0 1 0 0 2r 2 < D 

2r 2 0 0 .1 0 0 set q 3 = 0 

r 3 = 2r 2 0 0 .1 0 0 2r 3 ≥ D 

2r 3 0 1 .0 0 set q 4 = 1 

• Result is Q = 0.1101, continuing the calculation for higher 

precision we would have received Q = 0.11010 



Robertson Diagramm for Restoring Division 

• With r i−1 < D, the quotient bit q i should be selected such 

that r i < D and r i ≥ 0, resulting in the condition 

r i = 2·r i−1 −D 

r i 

✻ 

D 

 

q i = 0 q i = 1 

✲ 

D 2D 2r i−1 

• Intherestoringdivisionalgorithm,theoldremainder2r i−1 

isrestoredifthesubtraction2r i−1 −D wasnegative,shifted 

and after a subsequent subtraction we obtain 4r i−1 −D. 



Sequential Non-Restoring Division Algorithm 

• The corrections at negative reminders are postponed to 

later steps, thus obtaining 2r i−1 −D < 0. 

• Shifting it and adding D we get a new reminder 2(2r i−1 − 

D)+D = 4r i−1 −D. 

• These steps are repeated as long as the reminder remains 

negative. 

• The quotient bit is determined as follows: 

{ 1 if 2ri−1 ≥ 0 

q i = 

1 if 2r i−1 < 0 

• With the above decision, the reminder is computed with 

r i = 2r i−1 −q i ·D 

• The correction of a ”wrong” selection of the quotient (reminder 

gets negative) is done by using 1 instead of 0. 

Consider q i q i−1 = 10 (too large) would be corrected by 

q i q i−1 = 11. 

• If the final reminder and divident have opposite signs, a 

final correction is needed Q cor = Q−2 −(n−1) 



Example of Sequential Non-Restoring Division 

Q = X/D, with X = −3/8 10 and D = 3/4 10 = 0.11 2 

r 0 = X 1 .1 0 1 2r 0 < 0 

2r 0 1 1 .0 1 0 set q 1 = 1 

add D + 0 0 .1 1 0 

r 1 0 0 .0 0 0 2r 1 ≥ 0 

2r 1 0 0 .0 0 0 set q 2 = 1 

add −D + 1 1 .0 1 0 

r 2 1 1 .0 1 0 2r 2 < 0 

2r 2 1 0 .1 0 0 set q 3 = 1 

add D 0 0 .1 1 0 

r 3 1 1 .0 1 0 2r 3 < 0 

2r 3 1 .0 1 0 0 correct or set q 4 = 1 

• Result before correction is Q = 0.1111 = 0.1001, after the 

correction we get Q = 0.1000 



Robertson Diagramm for Non-Restoring Division 

• The non-restoring division can be represented graphically. 

• Horizontal lines represent ±D 

• Diagonal lines represent multiplication by 2 

• Vertical lines represent function selection (1vs.1) 

• Example: X = 0.5 10 ,D = 0.75 10 , thus Q = 111 2 = 0.101 2 

• Q = 0.111 

r i 

✻D = 0.110 

q 

i = 1 q i = 1 

r0.100 0 = X,r 2 0 = 0.100 2 

2r 2 < 0,q 3 = 1 ❍❍❍❍❍❍❥ 

2r 

❍❍❍❍❍ 

0 = 01.00 2 

r 

✛ 1 = 0.010 2r 0 − D 

0.010 ✲ 2,2r 1 

2 

❍ 

11.100 

✻ 

❍❍❥ 

✻ 

✲ 

❍❨ 2 

10.10 11.01 0.110 01.00 2 

 

❍ 

0.100 2 

r 2 = 11.110 ✛ ❄ 

11.110 2,2r ❍2 

 

2 

2r 11 −≥ D0,q 2 = 1 

 

 

−D = 11.01 

2r0 ≥ 0,q1 = 1 

2r i−1 

2D = 01.10 



Converting Sd Quotient to Two’s Complement 

• The non-restoring division generates representations incompatible 

to 2’s complement, thus a conversion for further 

processing is needed. 

• SD conversion algorithms start with the Lsb, thus needing 

the complete quotient before starting. 

• An new on-the-fly algorithm is needed to reduce total execution 

time. 

• It can be shown that the following equations and results 

are valid: 

p i = 1 2 (q i +1) 

(1−p 1 ).p 2 p 3···p m 1 

Q = X/D, with X = −3/8 10 and D = 3/4 10 resulting 

in the non-corrected value Q = 0.1111Q = 0.0100Q = 

0.100Q = 1.100Q = 1.1001Q = 1.1000 steps:step 1: replace 

all 1 by 0step 2: shift the given number left by 1 bit 

positionstep 3: complement the most significant bitstep 4: 

shift a 1 in the Lsb bit positioncorrect: subtract 0.0001 

from the non-corrected result Q = −0.5 10 



4.4 Square Root Extraction 

Simple SQRT Algorithm 

• square root extraction is similar to restoring division 

• assume Q = (0.q 1 q 2···q m ) being a fraction, it denotes the 

square root of X 

• Q i is the partially developed root at step i with the remainder 

r i : 

i∑ 

Q i = q k 2 −k 

k=1 

• leading to 

r i 2 −i = (X −Q 2 i) 

• we find with some calculations 

r i = 2r i−1 −q i ·(2Q i−1 +q i 2 −i ) 

• this can be viewed as a division with changing divisor D = 

(2Q i−1 +q i 2 −i ), see sequential restoring div: 26 

• for m → ∞ we get 

X −Q 2 = lim m→∞ 2 −m r m = 0 



Example of Simple SQRT Algorithm 

Q 2 = X, with X = 10/16 10 

r 0 = X 0 .1 0 1 0 

2r 0 0 1 .0 1 0 0 

−(0+2 −1 ) - 0 0 .1 0 0 0 

r 1 0 0 .1 1 0 0 set q 1 = 1,Q 1 = 0.1 

2r 1 0 1 .1 0 0 0 

−(2Q 1 +2 −2 ) - 0 1 .0 1 0 0 

r 2 0 0 .0 1 0 0 set q 2 = 1,Q 1 = 0.11 

2r 2 0 0 .1 0 0 0 

−(2Q 2 +2 −3 ) - 0 1 .1 0 1 0 

r 3 1 0 .1 1 1 0 set q 3 = 0,Q 3 = 0.110 

r 3 = 2r 2 0 0 .1 0 0 0 

2r 3 0 1 .0 0 0 0 

−(2Q 3 +2 −4 ) - 0 1 .1 0 0 1 

r 4 1 1 .0 1 1 1 set q 4 = 0,Q 4 = 0.1100 

r 4 = 2r 3 0 1 .0 0 0 0 

• Result is Q = 0.1100 and the final reminder is 2 −4 r 4 = 

16/256 = X −Q 2 = (160−144)/256 



4.5 Division Through Multiplication 

In the previously described division algorithms the number of 

repetitive steps are linear to the received number of bit precisions 

n in the result. In the next described algorithms the number 

of steps is proportinal to log 2 n. However, the algorithms 

are based on fast parallel multipliers. 

Division by Convergence 

• numerator and denumerator are both multiplied by the 

same factor 

Q = N D = N ·R 0 ·R 1···R −1 

D ·R 0 ·R 1···R −1 

→ Q 1 

• only the quotient needs to be calculated; the algorithm is 

suitable for floating-point computations 

• the essential step is to select the factors R i correctly 

prep: let D be a normalized binary fraction D = 0.1xxxx 

• therefore 1/2 ≤ D < 1 and D = 1−y where y ≤ 1/2 

iter 1: we select R 0 = 1+y, then 

D 1 = D ·R 0 = (1−y)·(1+y) = 1−y 2 

• since y 2 ≤ 1/4, D 1 satisfies D 1 ≥ 3/4 and is therefore 

closer to 1 than D, in binary D 1 = 0.11xxxx 



iter 2: we select R 1 = 1+y 2 and obtain 

D 2 = D 1 ·R 1 = (1−y 2 )·(1+y 2 ) = 1−y 4 

• now y 4 ≤ 1/16, D 2 satisfies D 2 ≥ 15/16, and is therefore 

closer to 1 than D 1 , in binary D 2 = 0.1111xxxx 

It can easily be shown that the demoninator converges to 1. 



Division by Convergence: Proof 

• It can easily be shown that the denominator converges to 

1. 

D ·R 0 ·R 1···R m−1 = (1−y)[(1+y)(1+y 2 )(1+y 4 )···] 

= (1+y)[(1−y)(1+y 2 )(1+y 4 )···] 

• The equation in brackets is the series expansion of 

1 

(1+y),for 0 ≤ y ≤ 1/2 

1 

lim i→∞ D i = (1+y)· 

(1+y) = 1 

step 1: D i+1 = D i ·R i and N i+1 = N i ·R i 

step 2: Two’s complement operation R i+1 = 2−D i+1 

D i = 1−y i2 



Example of Division by Convergence 

Ex11: Q = N/D, with N = 0.101101 = 0.703125 10 and D = 

0.1101 = 0.8125 10 D 0 = 0. 1101 

R 0 = 2−D 0 1 .0 0 1 1 

N 1 = N ·R 0 0 .1 1 0 1 0 1 0 1 1 1 

D 1 = D 0 ·R 0 0 .1 1 1 1 0 1 1 1 

R 1 = 2−D 1 1 .0 0 0 0 1 0 0 1 

N 2 = N 1 ·R 1 0 .1 1 0 1 1 1 0 1 0 1 0 0 0 0 1 ... 

D 2 = D 1 ·R 1 0 .1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 ... 

R 2 = 2−D 2 1 .0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 ... 

N 3 = N 2 ·R 2 0 .1 1 0 1 1 1 0 1 1 0 0 0 1 0 0 ... 

D 3 = D 2 ·R 2 0 .1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 

• To speed up the algorithm further on, look-up tables or 

ROMs for initial value of R 0 are used. 

4.6 B 

ased on the Newton-Raphson Algorithm a division by reciprocation 

can be performed. In this case first the reciprocal is 

calculated using the Newton-Raphson iteration algorithm, succeeded 

by a multiplication. The Netwon Raphson algorithms is 

a method of finding the zero of a given function f(x), where the 

zero ist the solution of f(x) = 0 



Division by Reciprocation using Newton-Raphson Algorithm 

• FirstthereciprocalD iscalculatedusingNewton-Raphson 

iteration algorithm. 

• In a second step a multiplication is performed to calculate 

Q = N/D 

• Newton-Raphsoningeneralisusedtocalculatethezerosof 

the function f(x), where f(x) is the solution for f(x) = 0. 

f(x) 

✻ 

❇ ❇ 

▲ f(x) 

❆ 

❚ 

❙ 

❅ 

❧◗❍❛❳ ❜ ✠ 

 

✲ 

x 

x 



Newton-Raphson Algorithm for Reciprocation 

• Let x 0 be first approximation and x i be the estimate at 

ith step, then x i+1 is: 

x i+1 = x i + f(x) 

−f ′ (x) 

• Substitutingf(x)byourreciprocalfindingfunctionf(x) = 

1/x−D we find: 

x i+1 = x i (2−Dx i ) 

f(x) 

✻ 

f(x) = 1 x − D 

❇ ❇ ❜ 

▲ 

▲❆ 

▲ 

dy ❚ 

▲ 

▲ 

❙ 

▲ ❅ 

▲ ❧◗❍❛❳ ✲ 

x i x 

dxx i+1 



Iterations: Newton-Raphson Algorithm for Reciprocation 

• Example D = 0.13, choose initial value: x 0 = 4.0 

• Calculate iteration step 3: x 3 = 7.67 

• The convergence is quadratic. 

• Again, initial values can be generated by ROMs or other 

methods for reduced numbers of iterations. 

f(x) f(x) = 1 x − D ❜ x = 7.69 

0.1 

✻ 

❇ ❇ ❜ 

▲ 

▲ x 

❆ i+1 = x i · (2 − D · x i) 

▲ 

❚ 

▲ ▲ 

❙❜ 

▲ ❅ ❡ 

▲ 

❡ ❧◗❍❛❳ ✲ 

▲ 

x 2 = 5.92 · (2 − 0.13 · 5.92) = 7.28 

x 

x 

0 x= 0 4x 1 = x 1 5.92x 2 

x 2 = 7.28 



5 Elementary Functions 

5.1 Basics 

In a seminar at MicroLab a new architecture has been found 

for reciprocal calulation based on the Newton-Raphson iteraton 

and a second order polynomial for initial value calculation. The 

architectuer is efficient as both steps, initial value calculation 

and Newton-Raphson iteration can be executed on the identical 

hardware unit [HSGJ10]. 

Elementary Functions 

• Elementary functions can be realized in hardware by different 

methods. Some elementary functions: 

e x , ln(x), sin(x), cos(x), tanh(x), arcsin(x), etc 

• Rom look-up tables might be used, but are large: for n ≥ 

20 the memory size is ≥ 2.6MB. 

• Cordic (cordination rotation digital computer) algorithm 

can be used. 

• Taylor series expansion can be used, but shows sometimes 

bad convergence; e.g.: 

e x = 

∞∑ 

i=0 

x i 

i! 


• Polynomial approximations can be used, etc. 


For different elementary functions there exist pre-calculated 

constants for the polynomial approximation. Tables with such 

constants can be found in [?]. 



Polynomial Approximation 

• Approximation based on two degree-5 polynomials. Example: 

e x 

• We can express the elementary function as: 

e x = 2 xlog 2 e 

• andpartitioningtheexponentxlog 2 e = I+f initsinteger 

and fraction parts, such that 

e x = 2 I ·2 f 

• Implementing 2 I is straigtforward, evaluating 2 f can be 

done by a rational approximation, such as a two degree-5 

polynoms: 

2 f = ((((a 5f +a 4 )f +a 3 )f +a 2 )f +a 1 )f +a 0 

((((b 5 f +b 4 )f +b 3 )f +b 2 )f +b 1 )f +1 

• a i and b i are known constants, dependent of the target 

elementary function. 

5.2 Additive Normalization 

Therearealternativemethodstocalculatesuchelementaryfunctions 

which are more adapted to be implemented in hardware. 

Many of these known algorithms for evaluating elementary functions 

are based on the division by convergence algorithm discussed 

earlier. 



Additive Normalization: Exponential Algorithm 

• Algorithm is based on the ”division-by-convergence idea”: 

when one formulae is forced to a constant, the other yields 

the result (35). 

• To evaluate y = e x0 for a fractional argument x 0 , we use. 

x i+1 = x i −ln(b i ) 

y i+1 = y i ·b i 

• The b i ’s are selected in such a way that the sequence of x i 

approach 0, i.e., x m = 0. 

• Note that for for m → ∞ we have x m = 0: 

y i+1 ·e xi+1 = y i ·b i ·e xi−ln(bi) = y i ·e xi 

y m ·e xm = y 0 ·e x0 

= y 0 ·e x0 

y m 

• The similarity to the ”division-by-convergence” is now apparent, 

instead of keeping N i /D i constant, we now keep 

y i ·e xi constant. 



Iteration in Exponential Algorithm 

• To simplify the multiplication, the b i ’s are given the form: 

b i 

s i 

= (1+s i ·2 −i ) where 

∈ {−1,0,1} 

• The terms ln(1±2 −i ) have to be pre-calculated and stored 

in a look-up table. 

• Substituting the equations from the previous slide we get: 

x i+1 = x i −ln(1+s i ·2 −i ) 

y i+1 = y i ·(1+s i ·2 −i ) 

• To calculate the exponential function e x0 we have to find 

the vector s = {s 0 ,s 1 ,···s m−1 }. 

• Restrictingx 0 topositivefractionswegetasimplerschema 

for selecting s i ∈ {0,1} 

• In step (i+1) we set 

} 

if D =x i −ln(1+2 −i then si+1 = 1, x 

) ≥ 0 

i+1 = D 

else s i+1 = 0, x i+1 = x i 

• The convergence is linear: with n steps we get n bits. 



Example: Exponential Algorithm 

i (1+2 −i ) ln(1+2 −i ) (1−2 −i ) ln(1−2 −i ) 

0 10.0000000000 2 0.693147 0 - 

1 1.1000000000 2 0.405465 0.1000000000 2 -0.693147 

2 1.0100000000 2 0.223144 0.1100000000 2 -0.287682 

3 1.0010000000 2 0.117783 0.1110000000 2 -0.133531 

4 1.0001000000 2 0.060625 0.1111000000 2 -0.064539 

5 1.0000100000 2 0.030772 0.1111100000 2 -0.031749 

Calculate e 0.375 in a 6 bit precision: 

• Initial values: i = 0, x 0 = 0.375, y 0 = 1 , y 6 = 1.450 

iter 1: 

iter 2: 

iter 3: 

iter 4: 

iter 5: 

D = x 0 − ln(1+2 −0 ) = -0.318 s 0 = 0 

b 0 = (1+s 0 ·2 −0 ) = 1.000 

x 1 = x 0 = 0.375 y 1 = y 0 ·b 0 = 1.000 

D = x 1 − ln(1+2 −1 ) = -0.030 s 1 = 0 

b 1 = (1+s 1 ·2 −1 ) = 1.000 

x 2 = x 1 = 0.375 y 2 = y 1 ·b 1 = 1.000 

D = x 2 − ln(1+2 −2 ) = +0.152 s 2 = 1 

b 2 = (1+s 2 ·2 −2 ) = 1.250 

x 3 = D = 0.152 y 3 = y 2 ·b 2 = 1.250 

D = x 3 − ln(1+2 −3 ) = +0.034 s 3 = 1 

b 3 = (1+s 3 ·2 −3 ) = 1.125 

x 4 = D = 0.034 y 4 = y 3 ·b 3 = 1.406 

D = x 4 − ln(1+2 −4 ) = -0.027 s 4 = 0 

b 4 = (1+s 4 ·2 −4 ) = 1.000 

x 5 = x 4 = 0.034 y 5 = y 4 ·b 4 = 1.406 



iter 6: 

D = x 5 − ln(1+2 −5 ) = +0.003 s 5 = 1 

b 5 = (1+s 5 ·2 −5 ) = 1.031 

x 6 = D = 0.003 y 6 = y 6 ·b 5 = 1.450 



5.3 Multiplicative Normalization 

Multiplicative Normalization: Exponential Algorithm 

• To calculate e x we did continued summation of the terms 

ln(x+s i ·2 −i ). 

• This procedure is called additive normalization. 

• In a similar way we may define a multiplicative normalization 

where an x i is forced to 1 by continued multiplication 

with precalculated factors. We approximate now y = lnx. 

• Selecting b i such that x i+1 approaches 1, we thus have: 

x i+1 = x i ·b i 

y i+1 = y i −g(b i ) 

x i+1 = x 0 

∏ i 

l=0 b l → 1 

• If we select g(b l ) = ln(b l ) then we finally have: 

y m = y 0 −ln 1 x 0 

= y 0 +lnx 0 

• Similar approaches can be done for the other elementary 

functions, liketrigonometric, inversetrigonometric, hyperbolic, 

etc. 


References 


[HSGJ10] Andreas Habegger, Andreas Stahel, Josef Goette, 

and Marcel Jacomet. An efficient hardware implementation 

for a reciprocal unit. In submitted to: The 

5th IEEE International Symposium on Electronic Design, 

Test and Applications, Ho Chi Minh City, January 

13-15, 2010. 

[Kor02] Israel Koren. Computer Arithmetic Algorithms. A K 

Peters, Natick, Massachusetts, 2nd edition, 2002.

Mse: Hardware Algorithms Computer Arithmetic - microLab

Create successful ePaper yourself

Delete template?

Save as template?