A theta (1) algorithm for modulo addition ... - khaledelleithy.org

628IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 37, NO. 5, MAY 1990A O(l) Algorithm for Modulo AdditionKHALED M. ELLEITHY AND MAGDY A. BAYOUMIAbsfrad -A O(l) algorithm for large modulo addition for residuenumber system (RNS) based archictectures is proposed. The addition isdone in a fixed number of stages which does not depend on the size ofthe modulus. The proposed modulo adder is much faster than theprevious adders and more area efficient. The implementation of theadder is modular and is based on simple cells which leads to efficientVLSI realization.I. INTRODUC~IONRecently, the residue member system (RNS) is receiving increasedattention due to its ability to support high-speed concurrentarithmetic [ 11. Applications such as fast Fourier transform,digital filtering, and image processing utilize the high-speedRNS arithmetic operations; addition and multiplication, do notrequire the difficult RNS operations such as division and magnitudecomparison. The technological advantages offered by VLSIhave added a new dimension in the implementation of RNSbasedarchitectures [2]. Several high-speed VLSI special purposedigital signal processors have been successfully implemented[31-[51.Modulo addition represents the computational kernel forRNS-based architectures. Subtraction is performed by addersusing the additive inverse property [6]. Multiplication can betransformed into addition by several techniques [7]. Also, moduloaddition is the basic element in the conversion from RNS tobinary using the Chinese remainder theorem (CRT) [6]. Banerji[8] analyzed modulo addition in MSI technology. A VLSI analysisof modulo addition has been reported in [9]-[11]. In general,lookup tables and PLAs have been the main logical modulesused when the data granularity is the word. It has been foundthat such structure is only efficient for small size moduli. Formedium size and large moduli, bit-level structures are moreefficient, where the data granularity is the bit [12].In this paper, we present a modulo adder for medium sizeand large moduli. It is based on using a two-dimensional arrayof very simple cells (full adders). The modulo addition is performedin a fixed time delay independent of the size of themoduli.11. RESIDUE NUMBER SYSTEM (RNS)In RNS, an integer X can be represented by N-tuple ofresidue digits= (rl 1 r2 1 ' ' ' 7 rN)where r, = IXI,,,, with respect to a set of N moduli(rn1,m2;. .,rnN]. In order to have a unique residue representation,the moduli must be painvise relatively prime, that is:GCD(rn,,m,) = 1, for i # j .Then it is shown that there is a unique representation for eachnumber in the range of 0 Q X < nElm, = A4 where N is thenumber of moduli.Manuscript received April 12, 1989. This work was supported in part by theNational Science Foundation under Grant MIP-8809811. This letter wasrecommended by Associate Editor T. R. Viswanathan.The authors are with The Center for Advanced Computer Studies, Universityof Southwestern Louisiana, Lafayette, LA 70504.IEEE Log Number 9034410.h+ tl,Fig. 1. Modulo addition using two adders.Fig. 2./;Modulo addition using a lookup table.2m-1 rn-1 0m+ 1 1Zm-Z m-2The arithmetic operation on two integers A and B is equivalentto the arithmetic operation on its residue representation,that is:1A.BI.M = (IIAlrnI. IBl~lIml>IIAl~2~ IBb2lm2,. . . 3I I A I ~ N . I B I ~ N I ~ N )where ''." can be addition, subtraction, or multiplication. It isdesirable to convert binary arithmetic on large integers to residuearithmetic on smaller residue digits in which the operations canbe parallelly executed, and there is no carry chain betweenresidue digits.2.1. The Modulo AdditionGenerally, addition modulo rn has 2" - rn (n = [log rnl) incorrectresidue states. These states are in the range [m,2" - 11which may be called overflow states. The corrected residuenumbers can be obtained by two methods; employing a binaryadder or a correction table. In the first method, a constant(2" - rn) is added to correct the overflow residue states (generalizedend-round carry) as shown in Fig. 1. The addition isperformed as follows:y=lxl +xZlm= { z: 1 z:: m,if xl + x2 < mif + x2 rn.Two n-bit adders are used; the first computes x1 + x2, while thesecond computes x1 + xg - rn. The carry bit generated from thesecond adder indicates whether or not xl + x2 is greater thanrn. A multiplexer, controlled by the carry, selects the correctoutput. In the second method, a lookup table is used to correctthe incorrect residue states (2" - m), Fig. 2. The first algorithmof modulo addition has a time complexity of O(logn), and thesecond algorithm is not suitable for medium and large moduli.0098-4094/90/0500-062SS01.00 01990 IEEEAuthorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:38:37 EST from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 31, NO. 5, MAY 1990 629Fig. 3. A modulo sum adder.111. THE PROPOSED MODULO ADDERCarry save adder (CSA) [13] has been proved to have highspeedin multioperand addition. Basically CSA depends on theidea of not completing the addition process at a certain stage,but postponing it to the final stage. In the intermediate stagesnumbers are represented as sum and carry to avoid the completeaddition process.The idea of representing a number as a carry and a sum canbe used in the modulo addition to obtain a scheme that has aconstant speed which does not depend on the number of bits.The modulo adder is used to add two numbers A and B inmodulo m. Fig. 3 shows that A is represented as a pair ofnumbers (A,,A,), B is also represented as (Bs,B,), and theoutput C is represented as (C,,C,). Each number is representedas a group of sum bits and carry bits. There is no uniquerepresentation for A, and A,. The condition that needs to besatisfied isIA, + ACIm = 14,.One possible representation isA, = JAl, A, = 0.The choice of a representation has no implication on the complexityof the design. With such representation, four numbers(A,, A,, B,, B,) need to be added, and two steps of CSA arerequired. After the addition process we need to detect if - Mor 2*(- M) is required to adjust the result. The adjustingprocess takes at most three steps. Since the adder has a fixednumber of steps-five-no matter how long A and B are, itcan be used in a multioperand pipelined addition scheme [141.3.1. The Modulo Addition AlgorithmThe proposed algorithm for modulo m addition of two numberscan be described as follows.Algorithm modulo add (A, B, Result)Znput: Two variables A and B in modulo m,A is representedas A, and A,. B is represented as B, and B,. Allvariables are n bit numbers (2"-' < m Q 2").Output: Variable Result represented as Result. and Result,.The relation between A, B, and Result is: Result =IA + BIm.Procedure:beginDo in parallelbeginCall Sum(temp,, A,, A,, B,)Call Carry(temp,, A,, A,, B,)endDo in parallelbeginCall Carry(temp,, temp,, temp,, B,)Call Carry(temp,, temp,, temp,, B,)endCase (temp, [n + 11 -temp, [n + 11) of0: Do in parallelbeginResult, := temp,Result, := temp,endexit1: do in parallelbeginCall Sum(temp,, temp,, temp,, rn))Call Carry(temp,, temp,, temp,, (2"- rn))end2: Do in parallelbeginCall sum(temp,, temp,, temp,, 2*(2" - rn))Call CarryItemp,, temp,, temp,, 2*(2" - rn))endend caseCase (temp, [n + 11) of0: do in parallelbeginResult, temp,Result ,:= temp,endexit1: Do in parallelbeginCall Sum(temp,, temp,, temp,, (2"- rn))Call Carry(temp,, temp,, temp,, (2"- rn))endend caseCase (temp, [n + 11) of0: do in parallelbeginResult, := temp,Result, := temp,end1: Do in parallelbeginCall Sum(temp,, temp,, temp,, (2"- m))Call Carry(temp,,, tempo,, temp,, (2" - m))endDo in parallelbeginResult, := temp,Result, := temp,,endend caseend.Sum(A,B,C,D)beginDo in parallel (1 algorithm is shown in Fig. 4.Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:38:37 EST from IEEE Xplore. Restrictions apply.

630 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 31, NO. 5, MAY 19902'- M2( 2'-2'- M2( 2'- M )2n- M2( 2'- M 10 - JJ-J$&jF*..." .............."....................Initial: As= 101 1 1 11 101 1 1Ac. 11001 I I01 101Bp 11 1100010101Bp 1010101 1001 1M-2050 , N I 12Step 1. As= 101 11 11101 11Ac* 1100111011018.; 111100010101______-____---temp, = 10000000 I I 1 1temp,;llI! 11 1101010Step 2. temp,. 100000001 11 Itemp,: 11 11 11101010&. 10101011001 1______-___----temp,. l10101010110temp,=Ij010101010 1 10step 3. tamp,. 110101010110temp,. 0101010101102(2"- M) = 11 11 11 111100temp,= 01 I 1 11 11 1100temp6:ij101010101 100pia ....................................+-Ti!/0 Resultln] s Result[nl c#i!Fig. 4. Different stages of the modulo adder.ResultLl 1 Result[l ISCFig. 5.Step 4. temp,. 01 1 I I 1 11 1100temp6= 101010101 100____________--2"-n = 011111111110temp,= 1010101011 IOternpp.nl111t 1 111000Resu/fs= li?/i?/O/i?/ Ill)RPs/lIt0 : I / I I I I I I lC?i?C?A detailed example for the modulo additionTheorem 1: The modulo adder scheme for adding two n-bitnumbers in modulo rn has an asymptotic time complexity O(1).Proof: To prove that the number of steps is constant (five)we need to prove that the last carry is equal to zero in five orless steps. Induction is used to prove the correctness of thetheorem on the number of bits n.1) Basis step: for n=0, it means that we do not add anynumbers and in this case the required number of steps is zero.2) Induction hypothesis: assume for a fixed arbitrary n > 0that the maximum number of steps is five.3) Induction step: for numbers with n + 1 bits let:7 = temp, [ n + 11 +temp, [n +2].Then we have the following cases.(a) 7 = 0: then the carry propagation stopped at bit n, and itends after five steps at most according to the induction hypothesis.(b) 7 = 1: then the correction is 2"+' - rn in step 3. Sincern > 2", then 2"+' - rn < 2", which means that (2"+' - rn) [n]=0. The worst case we get to have temp,[ n + 11 and temp,[ n + 21to be equal to one. This means that temp,[n + 11 = 0 andtemp, [ n + 21 = 1, then temp, [ n + 21 = 0. In this case the correctionis done in two steps (step 3 and step 4).(c) 7 = 2: then the correction is 2*(2"+l- rn) in step 3. Theworst case we get to have temp, [ n + 11, temp, [ n + 21, and2*(2"+' - rn) to be equal to one. Then temp,[n + 11 = 1,temp6[n+1]=1, and 2"+'-M=0. At step 4 temp,[n+l]=Oand temp,[n +2]= 1. At step 5 temp,[n +1]= 1 and templ0[n+ 21 = 0. In this case the correction is done in three steps (steps3-5).As an example, the modulo addition of A = 1272 and B = 450for rn = 2050 is shown in Fig. 5. There is no unique representationfor A and B. One valid representation is shown in thisfigure. The detailed modulo addition operation is shown in thisexample. In step 1 we get temp,[l3]= 1, and in step 2 we gettemp,[l3]= 1, which means that at step 3 we have to add2(2" - M). At step 3 we get temp6[13] = 1, which means that atstep 4 we have to add 2" - M. At step 4 we get temp,[l3] = 0,which means that the addition process stops at step 4. The resultof step 4 is the final result.IV. MODULO ADDER EVALUATIONUsing the VLSI model of computation for asymptotic complexity[15], a comparative study for the proposed adder isanalyzed. For adder I (Fig. 11, using the binary adder of Brentand Kung [16], the complexity measures will be as follows:A = O(log rn log log rn) = O( n log n)T=O(loglogrn) =O(logn)AT, = O( n(10g .I,).For adder I1 (Fig. 21, using the complexity analysis of thecorrelation table of [17]:A = O(Iogrn IogIogrn + rnlog rn) = O(nI0gn +2"n) = O(n2")T=O(loglogrn+ logrn) =O(logn+n)=O(n)AT, = qn32n).For the proposed adder,A = O(n)T = O(1)AT,= O(n).V. CONCLUSIONSThe modulo adder introduced in this paper has a total timedelaycomplexity of O(1) for adding two n-bit numbers in modulorn. Based on the analysis of Section IV, this adder is theAuthorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:38:37 EST from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 37, NO. 5, MAY 1990 63 1fastest and the most area efficient for large moduli. The proposeddesign has the following advantages.1) It does not have any limitation on the size of the modulus.2) It is quite modular, and it is a two-dimensional array ofone type cell (full-adder).3) It is easy to pipeline.4) It is very efficient architecture for the implementation ofthe CRT decoding [14].REF ER EN c E sF. J. Taylor, “Residue arithmetic: A tutorial with examples,” IEEEComput. Mag., pp. 50-62, May 1984.M. A. Bayoumi, G. A. Jullien, and W. C. Miller, “A look-up tableVLSI design methodology for RNS structures used in DSP applications,”IEEE Trans. Circuits Syst., vol. CAS-34, pp. 604-616, June 1987.M. A. Bayoumi, “A high speed VLSI complex digital signal processorbased on quadratic residue number system,” in VLSI Signal ProcessingII. New York IEEE Press, 1986, pp. 200-211.-, “Digital filter VLSI systolic arrays over finite fields for DSPapplications,” in Proc. 6th IEEE Ann. Phoenix Conf. on Computers andCommunications, pp. 194-199, Feb. 1987.W. Jenkins and E. Davidson, “A custom-designed integrated circuit forthe realization of residue number digital filters,” in Proc. ICASSP1985, pp. 220-223, Mar. 1985.N. S. Szabo and R. I. Tanaka, Residue Arithmatic and Its Applications toComputer Technology. New York McGraw-Hill, 1967.M. A. Soderstrand and E. L. Fields, “Multipliers and residue numberarithmetic digital filters,” Electron. Lett., vol. 13, no. 6, pp. 164-166,Mar. 1977.D. K. Banerji, “A novel implementation method for addition andsubtraction in residue number systems,” IEEE Trans. Comput., vol.C-23, pp. 106-109, Jan. 1974.M. A. Bayoumi, G. A. Jullien, and W. C. Miller, “A VLSI implementationof residue adders,” IEEE Trans. Circuits Syst., vol. CAS-34, pp.284-288, Mar. 1987.M. A. Bayoumi, “VLSI PLA structures for residue number systemsarithmetic implementations,” in Proc. ISCAS 1987, 1987.C. L. Chiang and L. Johnsson, “Residue arithmetic and VLSI,” inProc. ICCD 83, pp. 80-83, Oct. 1983.K. M. Elleithy, “On hit-parallel processing for modulo arithmetic,”VLSI Tech. Rep. TR86-8-1, Ctr. Advanced Computer Studies, Univ. ofSouthwestern Louisiana, 1986.K. Hwang, Computer Arithmetic: Principles, Architecture, and Design.New York: Wiley, 1978.K. M. Elleithy, “On the bit-parallel implementation for the Chineseremainder theorem,’’ VLSI Tech. Rep. TR87-8-1, Ctr. Advanced ComputerStudies, Univ. of Southwestern Louisiana, 1987.G. Alia and E. Materinelli, “A VLSI algorithm for direct and reverseconversion from weighted binary number system to residue numbersystem,” IEEE Trans. Circuits Syst., vol. CAS-31, pp. 1033-1039, 1984.R. P. Brent and H. T. Kung, “A regular layout for parallel adders,”IEEE Trans. Comput. vol. C-31, pp. 260-264, Mar. 1982.M. A. Bayoumi, “Lower bounds for VLSI implementation of residuenumber system architectures,” Integration, The VLSI J., vol. 4, no. 4, pp.263-269, Dec. 1986.Adjusting the Parameters in Elliptic-Function FiltersH. J. ORCHARDAbstract -When designing elliptic-function filters there is usuallysome margin in performance to be distributed over the defining parameters.A recent paper offered some comparatively complicated formulasfor use in this stage of the design. However, a simpler method, originallydue to Darlington, is available and is described briefly.I. INTRODUCTIONThe solution given by Vlcek and Unbehauen [l] to what theyrefer to as the “degree equation” for elliptic-function filters isexact, but rather complicated for the intended use in adjustingthe parameters at the beginning of a design, and involvescomputing several elliptic functions of rational fractions of aquarterperiod. High precision is unnecessary at this stage in thedesign, and a much simpler formula which allows one to achievethe same end result with a pocket calculator was in fact given byDarlington [2] just 50 years ago, but seems to have been overlooked.The purpose of this note is to explain his formula inmore detail than appeared in [2] and to expand on it slightly.In the design of an elliptic-function filter one is given aspecification for: The passband ripple, up, the minimum stopbandloss, a,, (both in decibels) and the elliptic modulus k =wp/os, where op is the passband-edge frequency and os is thestopband-edge frequency. From these one must first find thesmallest integral value of the degree n that can be used. Thischoice for n will normally cause the filter to be slightly betterthan is called for in the specification, so one can adjust thevalues of up, a, and k that one uses in the design to allow somemargin inside the specification. The relation between these fourparameters, up, a,, k and n is given by the “degree equation.”11. DEFINITION OF THE ELLIPTIC-FUNCTION FILTERThe power ratio for an elliptic-function filter is most convenientlydefined by a pair of equations involving a parametricvariable, exactly analogous to those for the well-known Chebyshevfilter. The latter is defined by10~’’~ =I+ e2cos2nun = COSU(la)(1b)where U is the parametric variable and n is the degree. Thepassband edge is normalized to = 1 and the passband ripple isup = 10 log (1 + E‘) dB.The equations for the elliptic-function filter are obtained byreplacing the cosines in (1) by the Jacobian cd elliptic functionsand take the form’10n/’O = 1 + c2cd2(nuK, /K;k,)fl=cd(u;k)(3a)As in (l), this definition holds for both odd and even values ofthe degree n. K, and K are the real quarterperiods belongingto the elliptic functions with moduli k, and k in (3a) and (3b)respectively. The fre uency scale is still normalized to R = 1 atwp, rather than to & as in [l] and [2]; the latter normalization,though nicer for theoretical work, is a nuisance in practicaldesign. passband ripple is again given by (2).In order for (3) to define a rational function, the periodOf cd(nuKl iK; in the plane, must fit exactly ?Itimes into the period rectangle of cd(u; k), just as the periodstrip of cos nu fits n times into the period strip of cos U. ThisManuscript received April 28, 1989. This letter was recommended byAssociate Editor T. R. Viswanathan.The author is with the Electrical Engineering Department, University ofCalifornia Los Angeles, Los Angeles, Ca 90024.IEEE Log Number 8930187.‘The cd function is the same as the sn function by a real quarter period,just as the cosine is the same as the sine shifted by ~/2. Using the cdfunction rather than the sn function as in [2], allows one to describe both oddand even degree cases with one common formula.OO98-4094/90/0500-oS31~$01 .OO 01990 IEEEAuthorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:38:37 EST from IEEE Xplore. Restrictions apply.

A theta (1) algorithm for modulo addition ... - khaledelleithy.org

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?