Stochastic Decoding of LDPC Codes over GF(q - Integrated Systems ...

ece.mcgill.ca

Stochastic Decoding of LDPC Codes over GF(q - Integrated Systems ...

c○2009 IEEE. Personal use of this material is permitted. Permission fromIEEE must be obtained for all other uses, in any current or future media,including reprinting/republishing this material for advertising or promotionalpurposes, creating new collective works, for resale or redistribution to serversor lists, or reuse of any copyrighted component of this work in other works.doi: http://dx.doi.org/10.1109/ICC.2009.5199493


Stochastic Decoding of LDPC Codes over GF(q)Gabi Sarkis, Shie Mannor and Warren J. GrossDepartment of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada H3A 2A7Email: gabi.sarkis@mail.mcgill.ca, shie.mannor@mcgill.ca, warren.gross@mcgill.caAbstract—Nonbinary LDPC codes have been shown to outperformcurrently used codes for magnetic recording and severalother channels. Currently proposed nonbinary decoder architectureshave very high complexity for high-throughput implementationsand sacrifice error-correction performance to maintainrealizable complexity. In this paper, we present an alternativedecoding algorithm based on stochastic computation that has avery simple implementation and minimal performance loss whencompared to the sum-product algorithm. We demonstrate theperformance of the algorithm when applied to a GF(16) codeand provide details of the hardware resources required for animplementation.I. INTRODUCTIONLow-Density Parity Check (LDPC) codes are linear blockcodes that can achieve performance close to the Shannon limitunder iterative decoding. Binary LDPC codes have receivedmuch interest and are specified in recent wireless and wirelinecommunications standards, for example digital video broadcast(DVB-S2), WiMAX wireless (IEEE 802.16e) and 10 gigabitEthernet (IEEE 802.3an). Nonbinary LDPC codes defined overq-ary Galois fields (GF(q)) were introduced in [1] and wereshown to perform better than equivalent bit-length binarycodes for additive-white Gaussian noise (AWGN) channels. In[2], Song et al. showed that GF(q) LDPC codes significantlyoutperform binary LDPC and Reed-Solomon (RS) codes forthe magnetic recording channel. Chen et al. [3] demonstratedthat LDPC codes over GF(16) perform better than RS codesfor general channels with bursts of noise; thus making GF(q)LDPC a candidate to replace RS coding in many storagesystems. Djordjevic et al. concluded that nonbinary LDPCcodes achieve lower BER than other codes while allowingfor higher transmission rates when used with the fiber-opticchannel [4].LDPC codes over GF(q) are defined such that elementsof the parity check matrix H are elements of GF(q). As inthe binary case, these codes are decoded by the sum-productalgorithm (SPA) applied to the Tanner graph representationof the parity-check matrix H. Unfortunately, the nonbinaryvalues of H result in very high complexity for the checknode updates in the graph, presenting a significant barrierto practical realization. The only hardware implementation inthe literature is fully serial, consisting of only one variablenode and one check node [5]. There have been a number ofapproaches to reduce the complexity of the check node updatein the literature. MacKay et al. proposed using the fast Fouriertransform (FFT) to convert convolution to multiplication inthe check nodes [6]. Song et al. use the log-domain to replacemultiplication with addition [2]. Declercq et al. introduced theextended min-sum (EMS) algorithm as an approximation tothe SPA, computing likelihood values for only a subset ofthe field elements; thus reducing the number of computationsperformed [7]. While these approaches are simpler than adirect implementation of the SPA, there is a need to furtherreduce the complexity for practical decoder implementations.Recently, a new approach to decoding binary LDPC codesbased on stochastic computation ([8], [9]) was introduced in[10]. Stochastic decoders use random bit-streams to representprobability messages and result in simple node hardware andreduced wiring complexity. Subsequently, area-efficient fullyparallelhigh-throughput decoders with performance close tothe SPA were demonstrated in field-programmable gate arrays(FPGAs) [11], [12].We realized that the complexity benefits of stochastic decodingmight be even greater for nonbinary LDPC codes and couldresult in a practical decoder implementation. In this paper, wepresent a generalization of stochastic decoding to LDPC codesover GF(q). The algorithm has significantly lower hardwarecomplexity than other nonbinary decoding algorithms in theliterature.A. NotationII. SUM-PRODUCT DECODINGSince most digital systems transmit data using 2 p symbols,the focus in current research is on codes defined over GF(2 p ).In this section, we describe the SPA for decoding LDPC codesover GF(2 p ). However, it should be noted that the SPA workson any field GF(q) with minor modifications to notation andchannel likelihood calculations.The elements of GF(2 p ) can be represented as powersof the primitive element α, or using polynomials; the latterform∑is used in this section; so that the polynomial i(x) =pl=1 i lx l−1 , where i l are binary coefficients, represents anelement of GF(2 p ).The notation used in this section for representing internodemessages is similar to that of [7]; namely that U andV represent messages heading in the direction of check andvariable nodes respectively. The subscripts represent the sourceand destination nodes. For example, U xy is a message fromnode x to node y. All the messages are probability massfunction (PMF) vectors indexed using GF(2 p ) elements. Fig.1a shows this notation applied to a Tanner graph.B. AlgorithmWhile nonbinary codes can also be decoded using SPA onTanner graphs, the check node update is modified because the


This convolution represents a significant computational challengein implementing nonbinary LDPC decoders.III. STOCHASTIC DECODING(a)Fig. 1: Stochastic decoder graphs with X and X −1 denotingforward and inverse permutation operations. (a) message labels,(b) message propagation with EMs added to the decoder.elements of H are nonbinary. Therefore the check constraintfor a check node of degree d c is:∑d ch k i k (x) = 0, (1)k=1where h k is the element of H with indices corresponding tothe check and variable nodes of interest. This is different fromthe binary case where the check constraint is ∑ d ck=1 i k(x) = 0.To accommodate this change, Davey et al. [1] assigned valuesfrom H as labels to the edges connecting variable and checknodes and integrated the multiplication into the check nodefunctionality. Declercq et al. [7] introduced a third node typecalled the permutation node which connects variable and checknodes and performs multiplication as shown in Fig. 1a; therefore,reverting the check node constraint to ∑ d ck=1 j k(x) = 0.While the two approaches are functionally equivalent; the onein [7] results in simpler equations and implementation sinceall check nodes of the same degree are identical.The first step in the SPA is computing the channel likelihoodvector L v [i(x)] for each variable node v which is computedbased on the channel model and modulation scheme. Theoutgoing message from variable node v to permutation nodez is given by:U vz = L v ×d v∏p=1,p≠z(b)V pv , (2)where × is the term-by-term product of vectors and d v isthe∑variable node degree. Normalization is needed so thata∈GF (2 p ) U vz[a] = 1.Permutation nodes implement multiplication by an elementfrom H when passing messages from the variable to checknodes, and multiplication by the inverse of an element fromH in the other direction. As shown in [7] the multiplicationand multiplication by inverse can performed using cyclic shiftsof the positions of the values in a message vector except thosevalues indexed by 0.The parity check constraint does not include multiplicationby elements of H anymore; therefore, the check node updateequation is the convolution of incoming messages as shownin [7]:V ct = ⊛ dcp=1,p≠t U pa. (3)A message in the SPA for LDPC codes over GF(q) is avector containing the probabilities of each of the q possiblesymbols. Stochastic decoding uses streams of symbols chosenfrom GF(q) to represent these messsages; the number of occurrencesof a symbol in a stream divided by the total numberof symbols observed in the stream gives the probability of thatsymbol. The advantage of utilizing such a method for messagepassing lies in the simple circuitry required to manipulate thestochastic streams to reflect likelihood changes as presentedin Section III-D. Stochastic decoding of binary LDPC codesresults in simple hardware structures. The reader is referredto [8], [9], [10], [11], [12] for details on binary stochasticdecoding algorithms and their implementation.Similar notation to the SPA is used when describing thestochastic decoding message updates, the difference being thatmessages are serial stochastic streams instead of vectors; thus,an index t is used to denote the location of a symbol withina stream and the stream name is overlined, e.g. U vp (t).A. Node EquationsWinstead et al. [13] presented a stochastic decoding algorithmthat uses streams of integers instead of the conventionalbinary streams. In that work, an integer stream encodes theprobabilities of the states in a trellis, leading to a demonstrationof trellis decoding of a (16,11) Hamming code anda turbo product decoder built from the Hamming componentdecoders. However, that work did not interpret the integersas finite field symbols and did not utilize GF(q) arithmetic.In this section we present the node equations for a stochasticdecoder for LDPC codes over GF(q). Taking the view thatthe nonbinary streams are composed of finite field elements,we present message update rules that are much simpler thanthose derived from a straightforward application of the rules in[13]. In particular, the trellis representation of the convolutionin the check node reduces to Galois field addition. SectionIII-E demonstrates the performance of the stochastic algorithmwhen decoding a (256,128)-symbol LDPC code over GF(16).Variable Node: A stochastic variable node of degree d vtakes as input d v stochastic streams from permutation nodes inaddition to one generated based on channel likelihood values.In [13], the output of a node is updated if its inputs satisfysome constraint; otherwise, the output remains unchangedfrom the previous iteration. To implement a variable nodeconstraint on an output message stream at time t, we copythe input symbol to the output symbol if the input symbolson all the other incoming edges are equal at time t. For astochastic variable node with output U vp and inputs V iv , wepropose the following update rule:{ a if VU vp (t) =iv = a, ∀i : i ≠ p(4)U vp (t − 1) otherwise


Using equation (4) and assuming the inputs are independent,the PMF of the output is:P [U vp (t) = c] = ∏ P [V iv (t) = c]i+(1 − ∑ ∏P [V iv (t) = a])P [U vp (t − 1) = c]a∈GF(q)iAs in [13], if the stochastic streams are assumed to bestationary, then P [U vp (t) = c] = P [U vp (t − 1) = c] andthe PMF of U vp (t) becomes:∏P [V iv (t) = c]P [U vp (t) = c] =∑ia∈GF(q)(5)∏P [V iv (t) = a] . (6)Equation (6) is identical to the normalized output of a sumproductvariable node; therefore, equation (4) is a valid updaterule for the stochastic variable node.Permutation Node: The function of the permutation nodeis to remove multiplication by elements of H from the checknode constraint. In the sum-product algorithm this is achievedby a cyclic shift of the message vector elements as in sectionII-B. Here, we demonstrate that multiplying the stochasticstream from a variable to a check node by an element of Haccomplishes the same result. Assuming a permutation nodep which corresponds to h = α i , the permutation node outputmessage in a SPA decoder is defined such that each elementin the message vector is given by:U pc [a] = U vp [a.α i ],i∀ a ∈ GF(q).When, in a stochastic decoder, the permutation node multipliesall elements of the input stream by h, the output PMFbecomes:P [U pc (t) = a] = P [U vp (t) = a.α i ]The SPA and stochastic output PMFs are identical and sincethe multiplicative group of GF(q) is cyclic and multiplicationis closed on GF(q), the stochastic permutation node operationis equivalent to that of the SPA algorithm.Similarly, it can be shown that for messages passed fromcheck to variable nodes, the inverse permutation node operationis multiplication by h −1 . It should be noted that h ≠ 0,since a value of 0 in H signifies the lack of a connectionbetween a variable and a check node. Therefore, there are nopermutation nodes with a multiplier h = 0.Check Node: When deriving the stochastic update messagefor a check node, a degree-three node is considered and theresult is generalized to a check node of any degree. Let U 1c andU 2c be the node inputs, which are assumed to be independent,and V cp its output. From equation (3), the output of such anode when using the SPA is given as:P [V cp = z|U 1c , U 2c ] =∑P [U 1c = x]P [U 2c = y], (7)where ⊕ is GF(q) addition.x⊕y=zIn the stochastic node, we define the output as the GF(q)addition of input, i.e V cp (t) = U 1c (t) ⊕ U 2c (t). The PMF ofthe output is computed as:P [V cp (t) = z] = P [U 1c (t) ⊕ U 2c (t) = z] (8)= ∑P [U 1c (t) = x]P [U 2c (t) = y].x⊕y=zThe PMFs (7) and (8) are identical; therefore it is concludedthat GF(q) is a valid update message for a degree-3 stochasticcheck node.Since the output of a check node can be computed recursively[7], the previous conclusion can be generalized to acheck node of any degree, and the output messages for thesenodes are given as:V cp (t) =∑d ci=1,i≠pU ic (t), (9)where the summation is GF(q) addition.It can be readily shown that the previous node equationsreduce to the binary ones presented in [10] for GF(2).B. Noise-Dependent Scaling and Edge-MemoriesIn binary stochastic decoding the switching activity canbecome very low resulting in poor bit-error-rate performance.This phenomenon is called latch-up and is caused by cyclesin the graph that cause the stochastic streams to become correlatedinvalidating the independent stream assumption usedto derive equations (4) and (9). Two solutions were proposedin [10]: noise-dependent scaling and edge memories. Both ofthese methods are used to improve the performance of theGF(q) decoder.Noise-dependent scaling increases switching activity byscaling down the channel likelihood values. For example, whentransmitting data using BPSK modulation over an AWGNchannel the scaled likelihood of each received bit l ′ (i) iscalculated by:l ′ (i) = [l(i)] 2ασ2 nY ,where l(i) is the unscaled bit likelihood, σn2 is the noisevariance, and the ratio α Yis determined offline to yield thebest performance in the SNR range of interest. Accordingly theequation for computing the channel likelihood values becomes:p∏L[i(x)] = [l(i k )] 2ασ2 nY . (10)k=1Edge memories (EM) are finite depth buffers inserted betweenvariable nodes and permutation nodes and randomlyreorder symbols in the output streams of variable nodes; thus,they break correlation between streams without affecting theoverall stream statistics. The EM contents are updated withthe variable node output when the node update condition issatisfied, and remain intact otherwise. The output of the EMis that of the variable node in the first case, or a randomlyselected symbols from its contents in the second. Due to the


Algorithm Multiplication Addition LUTFFT-SPA [2] 2 p (d 2 c + 4d c ) p2 p+1 d c + 2 p 0Log-FFT-SPA [2] 0 (p2 p+1 + 2 p+2 )d c p2 p+1 d cStoc. d c − 1 d c − 1 0Stoc.-LUT 0 d c − 1 d c − 1TABLE I: The number of operations needed by FFT-SPA,Log-FFT-SPA, and stochastic decoders to compute a singlecheck node output message including the permutation nodeoperations.memory’s finite length, older symbols are discarded when newones are added.Figure 1b demonstrates the message passing mechanism andthe location of edge memories within a stochastic decoder.For complexity comparison, Table I provides the numberof operations needed to compute a single check node outputmessage in the FFT-SPA and Log-FFT-SPA algorithms aspresented in [2]. It should be noted that the operations for theSPA are for real numbers and quantization will degrade thedecoder performance; while those for the stochastic decoderare over a finite field GF(2 p ).C. Algorithm DescriptionAt the beginning of the algorithm the edge memories areinitialized using scaled channel likelihood values as PMFsfor their content distribution. The following steps describethe stochastic decoding algorithm for each decoding cycle. 1:Variable node messages are computed using equation (4), edgememories are updated where appropriate, and messages aresent from edge memories to permutation nodes. 2: Permutationnodes perform GF(q) multiplication on incoming messagesand send the results to check nodes. 3: Check node messagesare computed as in equation (9) and are sent to permutationnodes. 4: Permutation nodes perform GF(q) multiplication byinverse and send resulting messages to variable nodes. 5: Eachvariable node contains counters C[a] corresponding to GF(q)elements. These counters are incremented based on incomingmessages and the channel message L(t). A variable nodebelief is defined as arg max C[a]. 6: Variable nodes beliefsare updated accordingly.The streams are processed on a symbol-by-symbol basis,one symbol each cycle (steps 1-5), until the algorithm converges(the variable node beliefs satisfy the check constraints)or a maximum number of iterations is reached. As in the binaryalgorithm presented in [10] the processing is not packetized.D. ImplementationWhile the stochastic decoding algorithm is defined for anyfinite field; the implementation presented in this section islimited to GF(2 p ) as these are the most utilized fields and theyyield the simplest implementation. The polynomial representationof GF(2 p ) is used when implementing the algorithm.This choice greatly simplifies the circuitry needed to performGF(2 p ) addition. All gate number estimates assume 2-inputlogic gates in a tree configuration.(a) d v = 2 var. node(b) d c = 4 chk. nodeFig. 2: GF(8) stochastic elements.Variable Node: To implement the operation specified byequation 4, a GF(2 p ) equality check is needed. XNOR gatesand an AND gate are used to perform the check and providean enable (latch) signal to an edge-memory as shown in Figure(2a).To extend the circuit for a higher order field, more XNORgates are used and connected to a larger AND gate. Thisaccommodates the increase in the number of bits required torepresent each GF(2 p ) symbol in the stochastic streams. Forhigher degree nodes, the number of inputs to each XNORgate is increased. The total number of gates, without counters,required by a variable node is:[p(d v − 1)XNOR + (p − 1)AND]d v . (11)Each variable node requires a maximum of 2 p countersto track occurances of each symbol and determine the nodebelief. The size of EMs associated with a variable node ofdegree d v is d v lp bits, where l is the EM length.Permutation Node: Permutation nodes can be implementedusing GF(2 p ) multipliers. For a particular code, the symbolsarriving at a permutation node are always multiplied by thesame element of H. As a result, the multiplier can be designedto multiply by a specific (constant) element of GF(2 p ) insteadof a generic GF(2 p ) multiplier, significantly reducing circuitcomplexity. Alternatively, look-up tables (LUT) can be usedsince their size would not be large. The multiplication byinverse for messages passed in the other direction is implementedin a similar manner.If LUTs are used to implement multiplication, each noderequires two LUTs: one for multiplication by h and one formultiplication by h −1 . An operation LUT contains 2 p − 1entries each p bits wide.Check Node: The outgoing messages from check nodes areGF(2 p ) summations of incoming messages. Since the GF(2 p )symbols are represented using the polynomial form, thisoperation can be realized utilizing XOR operations betweencorresponding bit lines of messages. The circuit in Fig. 2b isan example of a degree 4 check node in GF(8).To implement a higher degree check node, the number ofinputs to each XOR gate is increased to account for the extraincoming messages. Extending this circuit to higher orderfields can be done by adding more XOR gates. The totalnumber of gates required by a check node is:[p(d c − 1)XOR]d c . (12)


10 -110 0 0 0.5 1 1.5 2 2.5 3 3.5 4SPStochastic DC max = 10 6Stochastic DC max = 10 510 -110 0 0 0.5 1 1.5 2 2.5 3 3.5 4SPStochastic DC max = 10 6Stochastic DC max = 10 510 -2Frame Error Rate10 -210 -310 -4Bit Error Rate10 -310 -410 -510 -610 -510 -710 -610 -8E b /N 0 (dB)Fig. 3: FER for a (256,128)-symbol (2,4)-regular LDPC codeover GF(16). EM length = 50,αY = 0.5.SNR (dB) 2.0 2.5 3.0 3.5 4.0DCavg (DCmax = 10 6 ) 22599 8888 4243 2329 1433DCavg (DCmax = 10 5 ) 17958 8511 4209 2326 1433TABLE II: Average number of decoding cycles.E. PerformanceFigures 3 and 4 demonstrate the performance of the stochasticdecoder compared to that of a SPA decoder when decodinga (256,128)-symbol LDPC code over GF(16) [14], whenusing an AWGN channel, BPSK, and random codewords. TheSPA decoder has a maximum of 1000 iterations, while thestochastic decoder’s maximum is 10 6 decoding cycles (DC).The performance of the two decoders is very similar and thetwo decoders perform identically for higher SNR values. Thechange in the slope of the error rate graph was also observedin [14]. We note that the maximum number of decoding cyclesis much greater than the average number of decoding cyclesas shown in Table II, with DCavg determining the decoderthroughput. Figures 3 and 4 demonstrate that, at higher SNRs,DCmax can be reduced with a small performance loss.It should be noted that the number of iterations in theSPA decoder and decoding cycles in the stochastic decoderare not directly comparable. SPA iterations involve complexoperations, for example, the node operations in EMS [15]involve sorting and iterating over incoming message elements;thus, requiring many clock cycles. In a stochastic decoder, adecoding cycle is very simple and can be completed within asingle clock cycle. Also, due to the nature of stochastic computation,the proposed implementation lends itself to pipelining(due to the random order of the messages, the feedback loopin the graph is broken allowing pipelining [12]); thus, enablingclock rates faster than those possible with the SPA.IV. CONCLUSIONIn this paper we presented a stochastic decoding algorithmwhich we expect to enable practical high-throughput decodingof LDPC codes over GF(2 p ).E b /N 0 (dB)Fig. 4: BER for a (256,128)-symbol (2,4)-regular LDPC codeover GF(16). EM length = 50,αY = 0.5.ACKNOWLEDGEMENTThe authors would like to thank Prof. D. Declercq fromENSEA for helpful discussions.REFERENCES[1] M. Davey and D. MacKay, “Low-density parity check codes overGF(q),” IEEE Commun. Lett., vol. 2, no. 6, pp. 165–167, 1998.[2] H. Song and J. Cruz, “Reduced-complexity decoding of Q-ary LDPCcodes for magnetic recording,” IEEE Trans. Magn., vol. 39, no. 2, pp.1081–1087, 2003.[3] J. Chen, L. Wang, and Y. Li, “Performance comparison between nonbinaryLDPC codes and reed-solomon codes over noise bursts channels,”in Proc. International Conference on Communications, Circuits andSystems, L. Wang, Ed., vol. 1, 2005, pp. 1–4 Vol. 1.[4] I. Djordjevic and B. Vasic, “Nonbinary LDPC codes for optical communicationsystems,” IEEE Photonics Technology Letters, vol. 17, no. 10,pp. 2224–2226, 2005.[5] C. Spagnol, W. Marnane, and E. Popovici, “FPGA implementationsof LDPC over GF(2 m ) decoders,” in Proc. IEEE Workshop on SignalProcessing Systems, W. Marnane, Ed., 2007, pp. 273–278.[6] D. MacKay and M. Davey, “Evaluation of Gallager codes for short blocklength and high rate applications,” in In Codes, Systems and GraphicalModels. Springer-Verlag, 2000, pp. 113–130.[7] D. Declercq and M. Fossorier, “Decoding algorithms for nonbinaryLDPC codes over GF(q),” IEEE Trans. Commun., vol. 55, no. 4, pp.633–643, 2007.[8] B. Gaines, Advances in Information Systems Science. Plenum, NewYork, 1969, ch. 2, pp. 37–172.[9] V. Gaudet and A. Rapley, “Iterative decoding using stochastic computation,”Electronics Letters, vol. 39, no. 3, pp. 299–301, Feb. 2003.[10] S. Sharifi Tehrani, W. Gross, and S. Mannor, “Stochastic decoding ofLDPC codes,” IEEE Commun. Lett., vol. 10, no. 10, pp. 716–718, 2006.[11] S. Sharifi Tehrani, S. Mannor, and W. J. Gross, “An area-efficient FPGAbasedarchitecture for fully-parallel stochastic LDPC decoding,” in Proc.IEEE Workshop on Signal Processing Systems, 17–19 Oct. 2007, pp.255–260.[12] ——, “Fully parallel stochastic LDPC decoders,” IEEE Trans. SignalProcess., vol. 56, no. 11, pp. 5692–5703, Nov. 2008.[13] C. Winstead, V. Gaudet, A. Rapley, and C. Schlegel, “Stochastic iterativedecoders,” in Proc. International Symposium on Information TheoryISIT, 2005, pp. 1116–1120.[14] C. Poulliat, M. Fossorier, and D. Declercq, “Design of regular (2,d c)-LDPC codes over GF(q) using their binary images,” IEEE Trans.Commun., vol. 56, no. 10, pp. 1626–1635, October 2008.[15] A. Voicila, F. Verdier, D. Declercq, M. Fossorier, and P. Urard, “Architectureof a low-complexity non-binary LDPC decoder for highorder fields,” in Proc. International Symposium on Communications andInformation Technologies ISCIT ’07, F. Verdier, Ed., 2007, pp. 1201–1206.

More magazines by this user
Similar magazines