Efficient canonic signed digit recoding - Universidad de Cantabria

grupos.unican.es

Efficient canonic signed digit recoding - Universidad de Cantabria

Microelectronics Journal 42 (2011) 1090–1097Contents lists available at ScienceDirectMicroelectronics Journaljournal homepage: www.elsevier.com/locate/mejoEfficient canonic signed digit recodingGustavo A. Ruiz n , Mercedes GrandaDepartamento de Electrónica and Computadores, Facultad de Ciencias, Avda. de Los Castros s/n, Universidad de Cantabria, 39005 Santander, Spainarticle infoArticle history:Received 14 December 2010Received in revised form14 June 2011Accepted 17 June 2011Available online 6 July 2011Keywords:Canonic signed digit (CSD)Digital arithmeticMinimal signed digitSigned digit representationabstractIn this work novel-efficient implementations to convert a two’s complement binary number into itscanonic signed digit (CSD) representation are presented. In these CSD recoding circuits two signals, Hand K, functionally equivalent to two carries are described. They are computed in parallel reducing thecritical path and they possess some properties that lead to a simplification of the algebraic expressionsminimizing the overall hardware implementation. As a result, the proposed circuits are highly efficientin terms of speed and area in comparison with other counterpart previous architectures. Simulations ofdifferent configurations made over standard-cell implementations show an average reduction of about55% in the delay and 29% in the area for a ripple-carry scheme, 47% in the delay and 17% the area in acarry look-ahead scheme, and 36% in the delay and 31% the area in a parallel prefix scheme.& 2011 Elsevier Ltd. All rights reserved.1. IntroductionThe canonical signed digit (CSD) representation is one of theexisting signed digit (SD) representations with unique featureswhich make it useful in certain DSP applications focusing on lowpower,efficient-area and high-speed arithmetic [1]. The CSD codeis a ternary number system with the digit set {1¯ 0 1}, where 1¯stands for 1. Given a constant, the corresponding CSD representationis unique and has two main properties: (1) the number ofnonzero digits is minimal, and (2) no two consecutive digits areboth nonzero, that is, two nonzero digits are not adjacent. Thefirst property implies a minimal Hamming weight, which leads toa reduction in the number of additions in arithmetic operations.The second property provides its uniqueness characteristic. However,if this property is relaxed, this representation is called theminimal signed digit (MSD) representation, which has as manynonzeros as the CSD representation, but which provides multiplerepresentations for a constant [2,3].CSD representation has proven to be useful for the design andimplementation of digital filters such as the area-efficient programmableFIR digital filter architecture in Ref. [4], Chebyshev FIR filterdesign with some constrains in terms of hardware and frequencydomain in Ref. [5], low-complexity algorithms for design filters inRef. [6], 2D FIR and IIR filter design in Ref. [7] and the digit-serialCSD filter FPGA architecture for image conversion proposed inRef. [8]. It has also been used in the reduction of the complexityn Corresponding author.E-mail addresses: ruizrg@unican.es (G.A. Ruiz),grandam@unican.es (M. Granda).of digital filters [9–11] or matrix multipliers [12] applying sharedsubexpression methods. CSD code has been largely exploited toimplement efficient multipliers [13–15]. It enables the reduction ofthe number of partial products that must be calculated fast, andalso low-power consumption and low area structure of a multiplierfor DSP applications [16] or self-timed circuits [17]. In fixed-widthmultipliers, CSD succeeded in reducing the mean square error [18]or the compensation error using efficient sign extension [19].Finally, other applications of CSD coding in reversible image colortransforms [20], Montgomery exponentiation [21,22] or vectorrotational CORDIC [23] have been proposed.Many researchers have addressed the question of CSD recodingto convert two’s complement into CSD code. Already in 1960,Reitwiesner proposed an algorithm for converting two’s complementnumbers to a minimum weight radix-2 (binary) signed digitrepresentation [24]. From the practical point of view, the traditionalapproach to generate the CSD representation uses look-up table[25,26]. Here, there is a great similarity in the carry definition usedin CSD recoding and in conventional adders, which suggest that theimplementation of fast CSD converters should be based on wellknownstructures of classical adders. However, some algorithms toconvert two’s complement into CSD numbers try to reduce thecomputational complexity [27,28],butarenotsuitableforhardwareimplementation. Other hardware approaches propose the bypassmethod [29], fast carry look-ahead circuits [30] or parallelprefix schemes [31,32] to reduce hardware but they only focus oncarry optimization without considering the overall CSD recoding.All of these algorithms generate the CSD code recursively from theleast significant bit (LSB) to the most significant bit (MSB). However,in some applications, such as the computation of exponentiation,the conversion from MSB to LSB brings some advantages.0026-2692/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.doi:10.1016/j.mejo.2011.06.006


G.A. Ruiz, M. Granda / Microelectronics Journal 42 (2011) 1090–1097 1091Okeya et al. [33] proposed an algorithm to carry out this conversionbut it requires additional memory to store some precomputedelements. Indeed, efficient MSB-to-LSB algorithms yield a MSDrepresentation [34,35] but not a CSD one because of havingconsecutive nonzero digits.This paper presents novel, efficient standard cell-based implementationsto convert a two’s complement binary number into itsCSD representation based on two signals, H and K, functionallyequivalent to two carries. These signals were already defined inthe implementation of 3X terms for radix-8 encoding [36], buthere they are used for the CSD recoding in a more resourcefulway. Implementations in a 130 nm standard cell CMOS technologyof previously published architectures have been comparedwith the proposed scheme demonstrating that the circuits arehighly efficient in area and speed. The remainder of this paper isorganized as follows. In Section 2, a brief introduction anddefinitions related to CSD recoding are presented. New compactalgebraic expressions for optimal CSD recoding in terms of signalsH and K and their application in the efficient implementation ofCSD recoders for different configurations are described in Section3. Simulations and comparisons are listed in Section 4. Finally, theconclusions are stated in Section 5.2. CSD recodingThe CSD representation of an integer number is a signed andunique digit representation that contains no adjacent nonzerodigits. Given an n-digit binary unsigned number X¼{x 0 , x 1 , y,x n 1 } expressed asX ¼ Xn 1x i U2 i , x i Af0,1g ð1Þi ¼ 0then the (nþ1)-digit CSD representation Y¼{y 0 , y 1 ,y, y n }ofX isgiven byY ¼ Xn 1x i U2 i ¼ Xny i U2 i , y i Af1,0,1g ð2Þi ¼ 0i ¼ 0The condition that all nonzero digits in a CSD number areseparated by zeros implies thaty i þ 1 Uy i ¼ 0, 0rirn 1 ð3ÞFrom this property, the probability that a CSD n-digit has anonzero value [13,26] is given byPð9y i 9 ¼ 1Þ¼ 1 3 þ 1 9n 1 1 n ð4Þ2As n becomes large, this probability tends to 1/3 while thisprobability becomes 1/2 in a binary code. Using this property, thenumber of additions/subtractions is reduced to a minimum inmultipliers [14–17] and, as a result, an overall speed-up can beachieved.The adoption of a ternary number system adds some flexibilityto the CSD representation, since it allows the number of nonzerodigits to be minimized, but it requires that each digit y i must beencoded over two bits {y s i , yd i}. Table 1 shows the two mostfrequently used encodings in practice [1]. Encoding 1 can beviewed as a two’s representation. However, encoding 2 is preferablesince it satisfies the following relationy i ¼ y d iy s ið5Þwhere y s irepresents the sign bit and y d ithe data bit. This encodingalso allows an additional valid representation of 0 when y s i¼1 andy d i¼1, which is useful in some arithmetic implementations. In thewhole paper, this encoding is used.Table 1Two most encodings used in the binary representation of a CSD digit (1¯ stands for1). In this paper, encoding 2 is used.Encoding 1 Encoding 2y i y s i yd i y s i yd i0 00 001 01 01¯1 11 10Hashemian [27] presented the binary coded CSD (BCSD)number to avoid extra data word representation. The BCSDrecoding is based on a simple binary representation B¼{b 0 ,b 1 ,y,b n 1 } of a CSD code, which uses the same number of bitsas the original two’s complement representation. It takes advantageof the CSD property, in which no two adjacent digits can bothbe nonzero, to assign the next position of each nonzero CSD digitas sign bit while maintaining the same size data word. Thus, if thebit i in a CSD code is nonzero, y i a0 (which means that y i þ 1 ¼0),then the bit i is nonzero in the BCSD code, b i a0, and thefollowing bit b i þ 1 acts as a sign bit: b i þ 1 ¼0 means y i is positiveand b i þ1 ¼1 means y i is negative. As a result, y i ¼0 is encoded asb i ¼0, y i ¼1 as b i þ 1 b i ¼01 and y i ¼1¯ as b i þ 1 b i ¼11. A simpleconversion between a CSD coding and BCSD coding isb i ¼ y d i þys i 1b i þ 1 ¼ y d i þ 1 þys i0 0 1 0 1 0 1 1 10 0 1 0 1 1 0 0 10 0 1 1 0 1 0 0 10 1 0 1 0 1 0 0 1Fig. 1. Conversion process from binary to CSD code.Since this conversion can be obtained by a simple operation, inthis paper the two-bit encoding representation is used to make itsreading and understanding easier, and, without loss of generality,it can be straightforwardly extended to other representations.The conversion from a binary representation to CSD representationis mostly based on the following identity2 i þ j 1 þ2 i þ j 2 þ...þ2 i ¼ 2 i þ j 2 i ð7ÞThis means that a string of 1s can be replaced by a 1, followedby 0s, followed by a 1¯ . Isolated 1s are left unchanged, but isolated0s are re-examined in such a way that, after applying Eq. (7), pairsof type 11¯ are changed to 01. For example, the binary number(001010111) 2 is equivalent in a CSD representation to 0101¯ 01¯ 001¯ ;the encoding process is graphically shown in Fig. 1. Traditionally,this encoding is performed from LSB to MSB using two adjacentbits and a carry signal according to the recoding algorithm shownin Table 2 [25,26]. Here, the carry-out c i ¼1 if and only if there aretwo of three 1s among the three inputs x iþ1 , x i and c i 1 ,thatisc i ¼ x i x i þ 1 þðx i þx i þ 1 Þc i 1 , being c 1 ¼ 0 ð8ÞThe variable D¼{d 0 , d 1 ,y,d n 1 }, which flags all nonzero digitsin the CSD representation, is defined asd i ¼ x i c i 1 ð9Þð6Þ


1092G.A. Ruiz, M. Granda / Microelectronics Journal 42 (2011) 1090–1097Since y i takes one of the three values {0, 1, 1¯ }, two bits {y s i , yd i }are necessary to encode it, which are defined from Table 2 as(y d i¼ x i þ 1 ðx i c i 1 Þ¼x i þ 1 d iy s i ¼ x ð10Þi þ 1ðx i c i 1 Þ¼x i þ 1 d iFor the sake of clarity, the following example describes thedifferent signals used in the CSD recoding:X ¼ 001010111Carry ¼ 011111110D ¼ 010101001Y ¼ 010101001In the case of a n-bit two’s complement binary number X, theCSD representation is given byY ¼ x n 1 U2 n 1 þ Xn 2x i U2 i ¼ Xn 1y i U2 ið11Þi ¼ 0i ¼ 0here, only n CSD digits are necessary as the value of the binarynumberislimitedto[ 2 n ,2 n 1 ]. Negative integers in CSD can beobtained trivially from their positive counterpart by changing thesigns of all nonzero digits. For example, the CSD code 0101represents the decimal number 3, while 0101 represents thenumber 3. Therefore, the conversion of a negative n-digit two’scomplement binary number X into its CSD representation can beperformed from the well-known property X ¼ X þ1. Then reformulatingthe former equations for the case of a binary number X ina two’s complement representation, the conversion into its CSDrepresentation is given byt i ¼ x i x n 1 , ion 1 ð12Þc 0 i ¼ t i þ 1t i þðt i þ 1 þt i Þc 0 i 1 , being c0 1 ¼ x n 1 ð13Þwhere x n 1 represents the sign of X. From (13), c’ i propagates c i orits complement depending on sign of the X because of all inputs arecomputed according to (12). Therefore, another way to express c’ i isc 0 i ¼ x n 1 c i ð14ÞTable 2CSD coding.Applying these definitions, we getd i ¼ t i c 0 i 1 ¼ðx i x n 1 Þðx n 1 c i 1 Þ¼x i c i 1 ð15ÞThis means that the definition of D is independent of sign X. Ina similar way, the expression of Y is the same as that in (10) as(y d i¼ t i þ 1 d i ¼ t i þ 1 ðt i c 0 i 1 Þ¼ðx i þ 1 x n 1 Þððx i x n 1 Þc 0 i 1 Þ¼x i þ 1d iy s i ¼ t i þ 1d i ¼ t i þ 1 ðt i c 0 i 1 Þ¼ðx i þ 1 x n 1 Þððx i x n 1 Þc 0 i 1 Þ¼x i þ 1d ið16ÞFig. 2 shows the circuit to convert a n¼6 digit binary numberinto its CSD representation according to Eqs. (8)–(10), valid forboth unsigned and two’s complement binary numbers. The onlydifference arises in the last CSD digit. By introducing an extra signextension, x n ¼0 for unsigned numbers and x n ¼x n 1 for two’scomplement numbers, the last section changes depending on thesign of X in the following general expression:8y d n 1 ¼ d n 1>< y s n 1 ¼ 0For an unsigned number Xy d n ¼ d ð17Þn ¼ c n 1 ¼ x n c n 2>: y s n ¼ 0For a signed number X yd n 1 ¼ x nd i ¼ x n 1 c n 2y s n 1 ¼ x nd i ¼ x n 1 c n 2(ð18ÞFor unsigned X, nþ1 CSD digits are necessary to represent thatbinary number. However, as it is a positive number the two MSBdigits of CSD are also positives and, indeed, only their positivedata parts are necessary as shown in Eq. (17); the negative part isalways zero. For signed numbers, only n CSD digits are used andthe last digit is computed according to Eq. (18). As can be seen inFig. 2, the critical path of the circuit is fixed by the propagation ofthe carry signal, in a similar way to a conventional ripple-carryadder. There is a clear similarity between the carry definition inconventional adders and the definition of that carry in Eq. (8). Thissuggests that implementation of fast CSD converters should bebased on fast well-known carry look-ahead structures used inaddition.x i þ1 x i c i 1 y i c i Comments0 0 0 0 0 String of 0s0 0 1 1 0 End of 1s0 1 0 1 0 A single 10 1 1 0 1 String of 1s1 0 0 0 0 String of 0s1 0 1 1¯ 1 A single 01 1 0 1¯ 1 Beginning of 1s1 1 1 0 1 String of 1s3. New CSD recodingIn the definition of carry in Eq. (8), two adjoining carries sharethe same input variable. This means that the same input x i is usedin the generation of both carries c i 1 and c i . This characteristicallows the algebraic expressions of carry generation to be simplifiedin order to obtain efficient circuits.Fig. 2. Schematic circuit for the conversion of a binary number into its CSD representation (n¼6).


G.A. Ruiz, M. Granda / Microelectronics Journal 42 (2011) 1090–1097 1093Let X be a n-digit binary number. If we define two signals,H¼{h 0 ,h 1 ,y,h n 1 } and K¼{k 0 ,k 1 ,y,k n 1 },as(h i ¼ x ih i 1 , for i oddð19Þx i þh i 1 , for i even(k i ¼ x i þk i 1 , for i oddx i k i 1 , for i evenð20Þwith h 1 ¼0 and k 1 ¼0. Then c i can be formally expressed interm of odd or even index i by means of the following recursiverelation:(c i ¼ h i þx i þ 1 k i ¼ h i þk i þ 1 , for i oddð21Þx i þ 1 h i þk i ¼ h i þ 1 þk i , for i evenA demonstration by induction of this equation can be found inappendix A of Ref. [36]. Moreover, h i and k i have the followingproperties:8aÞ h i k i ¼ h i>< bÞ h i k i ¼ 0For i oddð22ÞcÞ h i þk i ¼ 1>: dÞ h i þk i ¼ k i8aÞ h i k i ¼ k i>< bÞ h i k i ¼ 0For i evencÞ h i þk i ¼ 1>: dÞ h i þk i ¼ h ið23ÞDemonstrations of these properties can be found in appendix Bof Ref. [36]. However, using these properties, Eq. (21) can betransformed into another compact form as(c i ¼h i þx i þ 1 k i ¼ h i þ 1 k i , for i oddð24Þx i þ 1 h i þk i ¼ h i k i þ 1 , for i evenFor the sake of clarify, the definition of the carry based on Hand K signals for n¼4 are described. From Eq. (8), we getc 0 ¼ x 1 x 0c 1 ¼ x 1 x 2 þðx 1 þx 2 Þc 0 ¼ x 1 x 0 þx 2 x 1c 2 ¼ x 2 x 3 þðx 2 þx 3 Þc 1 ¼ x 3 ðx 2 þx 1 x 0 Þþx 2 x 1c 3 ¼ x 3 x 4 þðx 3 þx 4 Þc 2 ¼ x 3 ðx 2 þx 1 x 0 Þþx 4 ðx 3 þx 2 x 1 ÞFrom the recursive property in the definition of signals H andK of Eqs. (19) and (20), it is straightforward to obtain thefollowing expressionsh 1 ¼ 0h 0 ¼ x 0 þh 1 ¼ x 0h 1 ¼ x 1 h 0 ¼ x 1 x 0h 2 ¼ x 2 þh 1 ¼ x 2 þx 1 x 0h 3 ¼ x 3 h 2 ¼ x 3 ðx 2 þx 1 x 0 Þh 4 ¼ x 4 þh 3 ¼ x 4 þx 3 ðx 2 þx 1 x 0 Þk 1 ¼ 0k 0 ¼ x 0 k 1 ¼ 0k 1 ¼ x 1 þk 0 ¼ x 1k 2 ¼ x 2 k 1 ¼ x 2 x 1k 3 ¼ x 3 þk 2 ¼ x 3 þx 2 x 1k 4 ¼ x 4 k 3 ¼ x 4 ðx 3 þx 2 x 1 ÞTherefore, from Eqs. (21) and (24), the carry can be expressedin terms of those signals asc 0 ¼ h 1 þk 0 ¼ x 1 x 0 þ0 ¼ h 0 k 1 ¼ x 0 x 1c 1 ¼ h 1 þk 2 ¼ x 1 x 0 þx 2 x 1 ¼ h 2 k 1 ¼ðx 2 þx 1 x 0 Þx 1c 2 ¼ h 3 þk 2 ¼ x 3 ðx 2 þx 1 x 0 Þþx 2 x 1 ¼ h 2 k 3c 3 ¼ h 3 þk 4 ¼ x 3 ðx 2 þx 1 x 0 Þþx 4 ðx 3 þx 2 x 1 Þ¼h 4 k 3The variable D in Eq. (9) can be directly obtained from H and Kwithout it being necessary to generate c i .Ifi is odd, this meansthat i 1 is even, we obtaind i ¼ x i c i 1 ¼ x i c i 1 þx i c i 1 ¼ x i Ux i h i 1 þk i 1 þx i ðx i h i 1 þk i 1 Þ¼ x i ðh i 1 þk i 1 Þþx i k i 1 ð25ÞIn Eq. (23), we get h i 1 þk i 1 ¼h i 1 and h i 1 k i 1 ¼ 0. Then,applying these properties to Eq. (25), it can be transformed asd i ¼ x i h i 1 þx i k i 1 ¼ðx i þh i 1 Þðx i þk i 1 Þ¼x i h i 1 ðx i þk i 1 Þ¼h i k ið26ÞFor i even, and likewise, we obtaind i ¼ h i k ið27ÞHowever, the variable Y for the CSD recoding in terms ofsignals H and K can be expressed as(y d i¼ x i þ 1 d i ¼ x i þ 1h i k i ¼ h i þ 1 k i , for i oddð28Þx i þ 1 h i k i ¼ h i k i þ 1 , for i even(y s i ¼ x i þ 1d i ¼ x i þ 1h i k i ¼ h i k i þ 1 , for i oddð29Þx i þ 1 h i k i ¼ h i þ 1 k i , for i evenThe proposed equations based on signals H and K, which arefunctionally equivalent to two carries, lead to an efficient hardwareimplementation of CSD recoders. Fig. 3 shows the proposedimplementation for the conversion of a 5-bit binary number Xinto its equivalent CSD representation. These signals can becomputed by two parallel and independent paths of simple ANDand OR gates in a ripple configuration. As a result, the critical pathis reduced and the hardware implementation is minimized incomparison with other structures derived from conventionaladders. The main body of this circuit is made up of alternativeFig. 3. Proposed circuit for the conversion of a binary number into its CSD representation (n¼6).


1094G.A. Ruiz, M. Granda / Microelectronics Journal 42 (2011) 1090–1097odd and even sections, which implement the Eqs. (19), (20), (28)and (29). The first section is simplified as8< y d 0 ¼ h 0k 1 ¼ x 0 x 1y s 0 ¼ h ð30Þ: 1k 0 ¼ h 1 ¼ x 0 x 1Only the last section is changed depending on sign of X in thefollowing general expressionFor a signed number8 (y d n 1 ¼ hn 2 k n 1 , for n 1 odd>< h n 1 k n 2 , for n 1 evenX (y s n 1 ¼ h ð31Þn 1k n 2 , for n 1 odd>: h n 2 k n 1 , for n 1 evenFor an unsigned number8 (y d n 1 ¼ hn 1 k n 1 , for n 1 odd>< h n 1 k n 1 , for n 1 evenX (y d n ¼ h n 1, for n 1 odd>: k n 1 , for n 1 evenð32ÞA more efficient implementation, suitable for a CMOS technology,is shown in Fig. 4. In this circuit, a straightforward manipulationof the Boolean algebra enables the implementation of thesignals H and K in terms of NAND and NOR gates, which are thefastest gates in CMOS circuits. Moreover, these signals, and alternativelytheir complementary ones, facilitate the generation of Y,thus reducing the complexity of the overall circuit in comparisonwith that of Fig. 3 as additional inverters are unnecessary.3.1. Carry look-ahead CSD recodingThe carry look-ahead principle is one of the most widely usedmethods to implement fast adders by introducing some kind ofparallelism in order to reduce the critical path of the circuit. Theexpressions of H and K defined in Eqs. (19) and (20) are of asimilar form to those used in conventional carry look-aheadadders [1,39]. For example, for i¼6 we geth 6 ¼ x 6 þx 5 ðx 4 þx 3 ðx 2 þx 1 x 0 ÞÞk 6 ¼ x 6 ðx 5 þx 4 ðx 3 þx 2 x 1 ÞÞð33ÞThese expressions are a simplified version of those used incarry look-ahead structures where the classical propagate andgenerate signals are directly removed by input signals. However,new generation and propagation signals associated with H and Kenable a significant reduction of the critical path. Using theexpressions in Eq. (33), h 6 can be rewritten ash 6 ¼ x 6 þx 5 x 4 þx 5 x 3 ðx 2 þx 1 x 0 Þ¼gh 1 þph 0 h 2ð34Þwhere gh j and ph j are, respectively, the generation and propagationsignals of H. In this case, gh 1 ¼x 6 þx 5 x 4 , ph 0 ¼x 5 x 3 andh 2 ¼x 2 þx 1 x 0 . In a general wayh 4ðj þ 1Þþ2 ¼ gh j þ 1 þph j h 4j þ 2 , j ¼ 0,1,2,... ð35Þwhere(gh j ¼ x 4j þ 2 þx 4j þ 1 x 4jph j ¼ x 4ðj þ 1Þþ1 x 4ðj þ 1Þ 1ð36ÞNote that these signals are defined for a group of four inputsand h 2 ¼gh 0 . Similarly, we can rewrite k 6 ask 6 ¼ x 6 ðx 5 þx 4 Þðx 5 þx 3 þx 2 x 1 Þ¼gk 1 ðpk 0 þk 2 Þð37Þwhere gk j and pk j are, respectively the generation and propagationsignals of K. In this case, gk 1 ¼x 6 (x 5 þx 4 ), pk 0 ¼x 5 þx 3 andk 2 ¼x 2 x 1 . In a general way, we getk 4ðj þ 1Þþ2 ¼ gk j þ 1 ðpk j þk 4j þ 2 Þ, j ¼ 0,1,2,... ð38Þwhere(gk j ¼ x 4j þ 2 ðx 4j þ 1 þx 4j Þ, j40pk j ¼ x 4ðj þ 1Þþ1 þx 4ðj þ 1Þ 1ð39ÞThe recurrence in Eqs. (35) and (38) based on propagation andgeneration signals obtained from a group of 4 input signals leadsto fast implementations of CSD recoders. Fig. 5a depicts aschematic of a circuit for performing the conversion of a signedbinary number (n¼16) into its CSD representation. Here, X o 3:54stands for {x 3 , x 4 , x 5 } and Y o2:5 4 stands for {{y s 2 , yd 2 }, {ys 3 , yd 3 }, {ys 4 ,y d 4 }, {ys 5 , yd 5}}. Two parallel circuits generate the signal H and K for4-bit groups in such a way that the biggest bits are expressed ash 14 ¼ gh 3 þph 2 ðgh 2 þph 1 ðgh 1 þph 0 h 2 ÞÞk 14 ¼ gk 3 ðpk 2 þgk 2 ðpk 1 þgk 1 ðpk 0 þh 2 ÞÞÞð40ÞFrom these signals, the CSD representation Y is computed bythree different kinds of compact modules: a first module (Fig. 5b),4-bit CSD modules (Fig. 5c) and a last module (Fig. 5d). Only thislast module changes depending on whether X is an unsigned orsigned number. However, when the number of modules is large, itis necessary to apply multi-level carry look-ahead schemes toreduce the delay; any interested reader can find more informationin Section 2.6 of Ref. [37].3.2. Parallel prefix CSD recodingAnother alternative for fast implementations of a CSD recodingis derived from the binary associative operator ‘‘J’’ developed forprefix adders [38]. As long as the recurrence relations of H and Kdefined in Eqs. (35) and (38) have identical properties to those ofconventional adders, new associative operators ‘‘J’’ and ‘‘K’’ ,Fig. 4. Efficient implementation for the circuit in Fig. 3.


G.A. Ruiz, M. Granda / Microelectronics Journal 42 (2011) 1090–1097 1095Fig. 5. Carry look-ahead scheme of a CSD recoder for n¼16. (a) Schematic of full circuit, (b) schematic of first CSD module, (c) schematic of 4-bit CSD module (i¼2,6,10)and (d) schematic of Last CSD module.Fig. 6. Parallel prefix scheme of a CSD recoder for n¼16.dependent on H and K can be defined as(ðgh, phÞ3ðgh n , ph n Þ¼ðghþph gh n , phUph n Þðgk,pkÞðgk n ,pk n Þ¼ðgkðpkþgk n Þ, pkþpk n Þð41ÞBased on these operators, parallel prefix circuits for computingsignals H and K can be implemented in an efficient way. Fig. 6shows an example of a parallel prefix scheme for CSD recoding(n¼16). A first stage computes the individual generation andpropagation signals. The remaining stages constitute a parallelprefix circuit with an organization based on the operators ‘‘J’’and ‘‘K’’, which generates the 4-bit group signals. This circuituses the same modules specified in Fig. 5 for computing the CSDrepresentation Y. The proposed implementation has a totalnumber of stages of log 2 (n/4)þ1 in comparison with log 2 (n)þ1used in a conventional scheme.


1096G.A. Ruiz, M. Granda / Microelectronics Journal 42 (2011) 1090–1097Table 3Comparison of CSD recoders using different configurations.Configuration 8-bit 16-bit 32-bit 64-bit Averagereductiont p(ns)Area(lm 2 )NANDEquiv.t p(ns)Area(lm 2 )NANDEquiv.t p(ns)Area(lm 2 )NANDEquiv.t p(ns)Area(lm 2 )NANDEquivt pAreaRipple-carry scheme[25,26] 0.96 282 35 2.07 605 75 4.28 1250 155 8.71 2542 315Fig. 4 0.46 205 25 0.93 431 53 1.88 883 109 3.79 1767 221Reduction (%) 52.1 27.3 55.6 28.8 57.0 29.4 57.6 29.7 55 29Carry look-ahead scheme[30] 0.49 327 41 0.96 685 85 1.94 1403 174 3.92 2840 414Fig. 5 0.31 252 31 0.53 567 70 0.90 1196 148 1.77 2487 352Reduction (%) 36.7 22.9 44.8 17.2 53.6 14.8 54.8 12.4 47 17Parallel prefix scheme[32] 0.51 387 48 0.67 922 114 0.92 2234 277 1.70 5459 677Fig. 6 0.34 256 32 0.53 647 80 0.62 1575 195 0.72 3721 461Reduction (%) 33.3 33.9 20.9 29.8 32.6 29.5 57.6 31.8 36 314. Simulation and comparisonsFor the purposes of comparison, different standard cell-basedimplementations and configurations of CSD recoders based onpreviously published circuits have been compared and analyzedwith those proposed in this paper. All of them were synthesizedwith Synopsys s design compiler release 2010.03 under theHCMOS9 STMicroelectronics 130 nm standard cell technologywith a 1.2 V power supply. Table 3 lists the synthesis results interms of worst-case propagation time (t p in ns) including a sizecircuit dependent wire-load model, and total area expressed interms of mm 2 or in an equivalent number of two input NANDgates (8.0688 mm 2 for this technology). In all cases, a digit two’scomplement binary number X with different values of n¼8, 16,32 and 64 has been used as input.To obtain representative results, we have selected the configurationsbased on the ripple-carry principle derived from theexpressions in Refs. [25,26], the optimized carry look-aheadscheme derived from the expressions presented in Ref. [30] andthe parallel prefix scheme presented in Ref. [32]. All of theseconfigurations have been implemented in terms of standard cellsfor comparison purposes and they have their counterpart in theproposed circuits shown in Figs. 4, 5 and 6, respectively, where thesignals H and K are computed in a similar way to those carries inthe former schemes. The results in Table 3 highlight the significantadvantages of the circuits proposed in terms of speed and area. Theimplementation in Fig. 4 reduces the delay by roughly 55 and thearea by 29% with respect to the counterpart derived from Refs.[25,26]. This is possible because of parallelism in the signals H andK and the simplification in the complexity of the circuit. Anotherimportant improvement is the decrease by up to 47% (average) indelay in the carry look-ahead scheme. Here, the generate andpropagate signals associated with H and K significantly reduce thecritical path in comparison with the scheme presented in Ref. [30],which computes the carries directly from input signals by cascadingof 4-bit look-ahead circuits. This comparison has been made interms of standard cells because a full CMOS complex gate toimplement the carry look-ahead circuit is proposed in Ref. [30].However, the reduction in area is limited, varying from 22.9% forthe 8-bit configuration to 12.4% for the 64-bit. The reason is thesimilar area that both 4-bit look-ahead circuits have decreasing theratio when the size of encoder increases. Finally, the fastestimplementation corresponds to the parallel prefix scheme.Although for 8-bit, the carry look-ahead scheme is slightly betterboth in terms of area and in speed, the delay of the parallel prefixscheme is only 0.72 ns for 64-bit, which is suitable for high-speedcircuits, and, moreover, without a significant increase in area. As aresult, Table 3 demonstrates that the expressions developed toimplement the CSD recoding lead to circuits, whose area and speedare notably improved in comparison with previously proposedimplementations.5. ConclusionThe conversion of a two’s complement binary number into itscanonic signed digit (CSD) representation can be efficientlyimplemented using two signals, H and K, functionally equivalentto two carries. These signals have two advantageous features:they are computed in parallel reducing the critical path and theysimplify the algebraic expressions minimizing the overall hardwareimplementation. Simulations performed with the proposedcircuits show high efficiency in terms of speed and area incomparison with other previous counterpart architectures. Moreover,other schemes used for accelerating carries based ontransistor structures such as domino carry look-ahead, multilevelcarry look-ahead and carry-skip circuits can be directly used.Finally, the new formulation presented in this work enables theCSD number system to be made available to many DSP applicationsas the CSD recoding can be performed at high speed with alow area cost.AcknowledgmentWe wish to acknowledge the financial help of the SpanishMinistry of Education and Science through TEC2006-12438/TCMreceived to support this work.References[1] I. Koren, Computer Arithmetic Algorithms (Second ed.), A.K. Peters, Ltd. (Ed.),2002.[2] B. Phillips, N. Burgess, Minimal weight digit set conversions, IEEE Transactionson Computers 53 (6) (2004) 666–677.[3] D.S. Phatak, I. Koren, Hybrid signed-digit number systems: a unified frameworkfor redundant number representations with bounded carry propagationchains, IEEE Transactions on Computers 43 (8) (August 1994) 880–891.[4] K.Y. Khoo, A. Kwentus, A.N. Wilson, A programmable FIR digital filter usingCSD coefficients, IEEE Journal of Solid-State Circuits 31 (6) (June 1996)869–874.[5] Y.M. Hasan, L.J. Karam, M. Falkinburg, A. Helwig, M. Ronning, Canonic signeddigit Chebyshev FIR filter design, IEEE Signal Processing Letters 8 (6) (2001)167–169.


G.A. Ruiz, M. Granda / Microelectronics Journal 42 (2011) 1090–1097 1097[6] J. Skaf, S.P. Boyd, Filter design with low complexity coefficients, IEEETransactions on Signal Processing 56 (7) (2008) 3162–3169.[7] T. Williams, M. Ahmadi, W.C. Miller, Design of 2D FIR and IIR digital filterswith canonical signed digit coefficients using singular value decompositionand genetic algorithms, Circuits Systems Signal Processing 26 (1) (2007)69–89.[8] H. Lee, G. Sobelman, FPGA-based digit serial CSD FIR filter for image signalformat conversion, Microelectronic Journal 22 (2002) 501–508.[9] R.I. Hartley, Subexpression sharing in filters using canonic signed digitmultipliers, IEEE Transactions on Circuits and Systems-II: Analog and DigitalSignal Processing 43 (10) (1996) 677–688.[10] C.Y. Yao, H.H. Chen, T.F. Lin, C.J. Chien, C.T. Hsu, A novel common-subexpression-eliminationmethod for synthesizing fixed-point FIR filters, IEEE Transactionson Circuits and Systems-I: Regular Papers 51 (11) (2004) 2215–2221.[11] S.T. Pan, A canonic-signed-digit coded genetic algorithm for designing finiteimpulse response digital filter, Digital Signal Processing 20 (2) (2010)314–327.[12] M.D. Macleod, A.G. Dempster, Common subexpression elimination algorithmfor low-cost multiplierless implementation of matrix multipliers, ElectronicsLetters 40 (11) (2004) 651–652.[13] G.K. Ma, F.J. Taylor, Multiplier policies for digital signal processing, IEEE ASSPMagazine 1 (1990) 6–20.[14] M.A. Soderstrand, CSD multipliers for FPGA DSP applications, in: Proceedingsof the International Symposium on Circuits and Systems (ISCAS), pp. V-469–V-472, 2003.[15] D.L. Iacono and M. Ronchi, Binary canonic signed digit multiplier for highspeeddigital signal processing, in: Proceedings of the 47th IEEE InternationalMidwest Symposium on Circuits and Systems, II, pp. 205–208, 2004.[16] Y. Wang, L.S. DeBrunner, D. Zhou, V.E. DeBrunner, A multiplier structurebased on a novel real-time CSD recoding, IEEE International Symposium onCircuits and Systems (ISCAS) (2007) 3195–3198.[17] G.A. Ruiz., M.A. Manzano, Self-timed multiplier based on canonical signeddigitrecoding, IEE Proceedings-Circuits, Devices and Systems 148 (5) (2001)235–241.[18] N. Petra, D. de Caro, A.G.M. Strollo, V. Garofalo, E. Napoli, M. Coppola,P. Todisco, Fixed-width CSD multipliers with minimum mean square error,IEEE International Symposium on Circuits and Systems (ISCAS) (2010)4149–4152.[19] S.M. Kim, J.G. Chung, K.K. Parhi, Low error fixed-width CSD multiplier withefficient sign extension, IEEE Transactions on Circuits and Systems-II: Analogand Digital Signal Processing 50 (12) (2003) 984–993.[20] S. Yang, B.U. Lee, Efficient transform using canonical signed digit in reversiblecolor transforms, Journal of Electronic Imaging 18 (3) (2009) 033010.[21] D.C. Lou, J.C. Lai, C.L. Wu, T.J. Chang, An efficient Montgomery exponentiationalgorithm by using signed-digit-recoding and folding techniques, AppliedMathematics and Computation 185 (2007) 31–44.[22] D.C. Lou, J.C. Lai, C.L. Wu, T.J. Chang, An efficient Montgomery exponentiationalgorithm by using signed-digit-recoding and folding techniques, AppliedMathematics and Computation 185 (1) (2007) 31–44.[23] A.Y. Wu, C.S. Wu, A unified view for vector rotational CORDIC algorithms andarchitectures based on angle quantization approach, IEEE Transactions onCircuits and Systems-I: Fundamental, Theory and Applications 49 (10) (2002)1442–1456.[24] S.W. Reitwiesner, Binary arithmetic, Advances in Computers 1 (1966)231–308.[25] K. Hwang, Computer Arithmetic, Principles, Architecture and Design, JohnWiley & Sons, New York, NY, 1979.[26] A. Peled, On the hardware implementation of digital signal processors, IEEETransactions on Acoustics, Speech, and Signal Processing ASSP-24 (1) (1976)76–86.[27] R. Hashemian, A new method for conversion of a 2’s complement to canonicsigned digit system and its representation, Proceedings of the AsilomarConference on Signals, Systems and Computers (1997) 904–907.[28] F. Xu, C.H. Chang, C.C. Jong, Hamming weight pyramid—a new insight intocanonical signed digit representation and its application, Computers andElectrical Engineering 33 (2007) 195–207.[29] M. Faust, O. Gustafsson, C.H. Chang, Fast and VLSI efficient binary-to-CSDencoder using bypass signal, Electronics Letters 47 (1) (2011).[30] A. Herrfeld, S. Hentschke, Look-ahead circuit for CSD-code carry determination,Electronics Letters 31 (6) (1995) 434–435.[31] S.K. Das, M.C. Pinotti, Fast VLSI circuits for CSD coding and GNAF coding,Electronics Letters 32 (7) (1996) 632–634.[32] C.K. Koc, Parallel canonical recoding, Electronics Letters 32 (22) (1996)2063–2065.[33] K. Okeya, K. Schmidt-Samoa, C. Spahn, T. Takagi, Signed binary representationsrevisited, Chapter of book ‘‘Advances in Cryptology—CRYPTO 2004,Lecture Notes in Computer Science, vol. 3152, Springer Berlin, Heidelberg,pp. 123–139, 2004.[34] M. Joye, S.M. Yen, Optimal left-to-right binary signed-digit recoding, IEEETransactions on Computers 49 (7) (July 2000) 740–748.[35] E. Backenius, E. Säll, O. Gustafsson, Bidirectional conversion to minimumsigned-digit representation, in: Proceedings of the IEEE International Symposiumon Circuits & Systems (ISCAS 2006). pp. 2413–2416, September 2006.[36] G.A. Ruiz, M. Granda, Efficient implementation of 3X for radix-8 encoding,Microelectronic Journal 39 (1) (2008) 152–159.[37] M.D. Ercegovac, T. Lang, Digital Arithmetic, Morgan Kaufmann Publishers,Elsevier Science, 2004.[38] P.M. Kogge, H.S. Stone, A parallel algorithm for the efficient solution of ageneral class of recurrence equations, IEEE Transactions on Computers C-22(8) (1973) 783–791.[39] M.D. Ercegovac, T. Lang., Digital Arithmetic, Morgan Kaufmann Publishers,2004.

More magazines by this user
Similar magazines