IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 31, NO. 5, MAY 1990 629Fig. 3. A <strong>modulo</strong> sum adder.111. THE PROPOSED MODULO ADDERCarry save adder (CSA) [13] has been proved to have highspeedin multioperand <strong>addition</strong>. Basically CSA depends on theidea of not completing the <strong>addition</strong> process at a certain stage,but postponing it to the final stage. In the intermediate stagesnumbers are represented as sum and carry to avoid the complete<strong>addition</strong> process.The idea of representing a number as a carry and a sum canbe used in the <strong>modulo</strong> <strong>addition</strong> to obtain a scheme that has aconstant speed which does not depend on the number of bits.The <strong>modulo</strong> adder is used to add two numbers A and B in<strong>modulo</strong> m. Fig. 3 shows that A is represented as a pair ofnumbers (A,,A,), B is also represented as (Bs,B,), and theoutput C is represented as (C,,C,). Each number is representedas a group of sum bits and carry bits. There is no uniquerepresentation <strong>for</strong> A, and A,. The condition that needs to besatisfied isIA, + ACIm = 14,.One possible representation isA, = JAl, A, = 0.The choice of a representation has no implication on the complexityof the design. With such representation, four numbers(A,, A,, B,, B,) need to be added, and two steps of CSA arerequired. After the <strong>addition</strong> process we need to detect if - Mor 2*(- M) is required to adjust the result. The adjustingprocess takes at most three steps. Since the adder has a fixednumber of steps-five-no matter how long A and B are, itcan be used in a multioperand pipelined <strong>addition</strong> scheme [141.3.1. The Modulo Addition AlgorithmThe proposed <strong>algorithm</strong> <strong>for</strong> <strong>modulo</strong> m <strong>addition</strong> of two numberscan be described as follows.Algorithm <strong>modulo</strong> add (A, B, Result)Znput: Two variables A and B in <strong>modulo</strong> m,A is representedas A, and A,. B is represented as B, and B,. Allvariables are n bit numbers (2"-' < m Q 2").Output: Variable Result represented as Result. and Result,.The relation between A, B, and Result is: Result =IA + BIm.Procedure:beginDo in parallelbeginCall Sum(temp,, A,, A,, B,)Call Carry(temp,, A,, A,, B,)endDo in parallelbeginCall Carry(temp,, temp,, temp,, B,)Call Carry(temp,, temp,, temp,, B,)endCase (temp, [n + 11 -temp, [n + 11) of0: Do in parallelbeginResult, := temp,Result, := temp,endexit1: do in parallelbeginCall Sum(temp,, temp,, temp,, rn))Call Carry(temp,, temp,, temp,, (2"- rn))end2: Do in parallelbeginCall sum(temp,, temp,, temp,, 2*(2" - rn))Call CarryItemp,, temp,, temp,, 2*(2" - rn))endend caseCase (temp, [n + 11) of0: do in parallelbeginResult, temp,Result ,:= temp,endexit1: Do in parallelbeginCall Sum(temp,, temp,, temp,, (2"- rn))Call Carry(temp,, temp,, temp,, (2"- rn))endend caseCase (temp, [n + 11) of0: do in parallelbeginResult, := temp,Result, := temp,end1: Do in parallelbeginCall Sum(temp,, temp,, temp,, (2"- m))Call Carry(temp,,, tempo,, temp,, (2" - m))endDo in parallelbeginResult, := temp,Result, := temp,,endend caseend.Sum(A,B,C,D)beginDo in parallel (1 < i Q n)~[i] :=(B[i]A C[i]) V (B[i] A D[i])V (C[i] A D[il)endCarry (A, B, C, 0)beginA[1] := 0Do in parallel (1 < i < n)A[ i + 11 := B[ i] CB C[i] CBD[ i]endAn implementation of the <strong>algorithm</strong> is shown in Fig. 4.Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:38:37 EST from IEEE Xplore. Restrictions apply.
630 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 31, NO. 5, MAY 19902'- M2( 2'-2'- M2( 2'- M )2n- M2( 2'- M 10 - JJ-J$&jF*..." .............."....................Initial: As= 101 1 1 11 101 1 1Ac. 11001 I I01 101Bp 11 1100010101Bp 1010101 1001 1M-2050 , N I 12Step 1. As= 101 11 11101 11Ac* 1100111011018.; 111100010101______-____---temp, = 10000000 I I 1 1temp,;llI! 11 1101010Step 2. temp,. 100000001 11 Itemp,: 11 11 11101010&. 10101011001 1______-___----temp,. l10101010110temp,=Ij010101010 1 10step 3. tamp,. 110101010110temp,. 0101010101102(2"- M) = 11 11 11 111100temp,= 01 I 1 11 11 1100temp6:ij101010101 100pia ....................................+-Ti!/0 Resultln] s Result[nl c#i!Fig. 4. Different stages of the <strong>modulo</strong> adder.ResultLl 1 Result[l ISCFig. 5.Step 4. temp,. 01 1 I I 1 11 1100temp6= 101010101 100____________--2"-n = 011111111110temp,= 1010101011 IOternpp.nl111t 1 111000Resu/fs= li?/i?/O/i?/ Ill)RPs/lIt0 : I / I I I I I I lC?i?C?A detailed example <strong>for</strong> the <strong>modulo</strong> <strong>addition</strong>Theorem 1: The <strong>modulo</strong> adder scheme <strong>for</strong> adding two n-bitnumbers in <strong>modulo</strong> rn has an asymptotic time complexity O(1).Proof: To prove that the number of steps is constant (five)we need to prove that the last carry is equal to zero in five orless steps. Induction is used to prove the correctness of thetheorem on the number of bits n.1) Basis step: <strong>for</strong> n=0, it means that we do not add anynumbers and in this case the required number of steps is zero.2) Induction hypothesis: assume <strong>for</strong> a fixed arbitrary n > 0that the maximum number of steps is five.3) Induction step: <strong>for</strong> numbers with n + 1 bits let:7 = temp, [ n + 11 +temp, [n +2].Then we have the following cases.(a) 7 = 0: then the carry propagation stopped at bit n, and itends after five steps at most according to the induction hypothesis.(b) 7 = 1: then the correction is 2"+' - rn in step 3. Sincern > 2", then 2"+' - rn < 2", which means that (2"+' - rn) [n]=0. The worst case we get to have temp,[ n + 11 and temp,[ n + 21to be equal to one. This means that temp,[n + 11 = 0 andtemp, [ n + 21 = 1, then temp, [ n + 21 = 0. In this case the correctionis done in two steps (step 3 and step 4).(c) 7 = 2: then the correction is 2*(2"+l- rn) in step 3. Theworst case we get to have temp, [ n + 11, temp, [ n + 21, and2*(2"+' - rn) to be equal to one. Then temp,[n + 11 = 1,temp6[n+1]=1, and 2"+'-M=0. At step 4 temp,[n+l]=Oand temp,[n +2]= 1. At step 5 temp,[n +1]= 1 and templ0[n+ 21 = 0. In this case the correction is done in three steps (steps3-5).As an example, the <strong>modulo</strong> <strong>addition</strong> of A = 1272 and B = 450<strong>for</strong> rn = 2050 is shown in Fig. 5. There is no unique representation<strong>for</strong> A and B. One valid representation is shown in thisfigure. The detailed <strong>modulo</strong> <strong>addition</strong> operation is shown in thisexample. In step 1 we get temp,[l3]= 1, and in step 2 we gettemp,[l3]= 1, which means that at step 3 we have to add2(2" - M). At step 3 we get temp6[13] = 1, which means that atstep 4 we have to add 2" - M. At step 4 we get temp,[l3] = 0,which means that the <strong>addition</strong> process stops at step 4. The resultof step 4 is the final result.IV. MODULO ADDER EVALUATIONUsing the VLSI model of computation <strong>for</strong> asymptotic complexity[15], a comparative study <strong>for</strong> the proposed adder isanalyzed. For adder I (Fig. 11, using the binary adder of Brentand Kung [16], the complexity measures will be as follows:A = O(log rn log log rn) = O( n log n)T=O(loglogrn) =O(logn)AT, = O( n(10g .I,).For adder I1 (Fig. 21, using the complexity analysis of thecorrelation table of [17]:A = O(Iogrn IogIogrn + rnlog rn) = O(nI0gn +2"n) = O(n2")T=O(loglogrn+ logrn) =O(logn+n)=O(n)AT, = qn32n).For the proposed adder,A = O(n)T = O(1)AT,= O(n).V. CONCLUSIONSThe <strong>modulo</strong> adder introduced in this paper has a total timedelaycomplexity of O(1) <strong>for</strong> adding two n-bit numbers in <strong>modulo</strong>rn. Based on the analysis of Section IV, this adder is theAuthorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:38:37 EST from IEEE Xplore. Restrictions apply.