Stochastic Decoding of LDPC Codes over GF(q - Integrated Systems ...

c○2009 IEEE. Personal use **of** this material is permitted. Permission fromIEEE must be obtained for all other uses, in any current or future media,including reprinting/republishing this material for advertising or promotionalpurposes, creating new collective works, for resale or redistribution to serversor lists, or reuse **of** any copyrighted component **of** this work in other works.doi: http://dx.doi.org/10.1109/ICC.2009.5199493

**Stochastic** **Decoding** **of** **LDPC** **Codes** **over** **GF**(q)Gabi Sarkis, Shie Mannor and Warren J. GrossDepartment **of** Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada H3A 2A7Email: gabi.sarkis@mail.mcgill.ca, shie.mannor@mcgill.ca, warren.gross@mcgill.caAbstract—Nonbinary **LDPC** codes have been shown to outperformcurrently used codes for magnetic recording and severalother channels. Currently proposed nonbinary decoder architectureshave very high complexity for high-throughput implementationsand sacrifice error-correction performance to maintainrealizable complexity. In this paper, we present an alternativedecoding algorithm based on stochastic computation that has avery simple implementation and minimal performance loss whencompared to the sum-product algorithm. We demonstrate theperformance **of** the algorithm when applied to a **GF**(16) codeand provide details **of** the hardware resources required for animplementation.I. INTRODUCTIONLow-Density Parity Check (**LDPC**) codes are linear blockcodes that can achieve performance close to the Shannon limitunder iterative decoding. Binary **LDPC** codes have receivedmuch interest and are specified in recent wireless and wirelinecommunications standards, for example digital video broadcast(DVB-S2), WiMAX wireless (IEEE 802.16e) and 10 gigabitEthernet (IEEE 802.3an). Nonbinary **LDPC** codes defined **over**q-ary Galois fields (**GF**(q)) were introduced in [1] and wereshown to perform better than equivalent bit-length binarycodes for additive-white Gaussian noise (AWGN) channels. In[2], Song et al. showed that **GF**(q) **LDPC** codes significantlyoutperform binary **LDPC** and Reed-Solomon (RS) codes forthe magnetic recording channel. Chen et al. [3] demonstratedthat **LDPC** codes **over** **GF**(16) perform better than RS codesfor general channels with bursts **of** noise; thus making **GF**(q)**LDPC** a candidate to replace RS coding in many storagesystems. Djordjevic et al. concluded that nonbinary **LDPC**codes achieve lower BER than other codes while allowingfor higher transmission rates when used with the fiber-opticchannel [4].**LDPC** codes **over** **GF**(q) are defined such that elements**of** the parity check matrix H are elements **of** **GF**(q). As inthe binary case, these codes are decoded by the sum-productalgorithm (SPA) applied to the Tanner graph representation**of** the parity-check matrix H. Unfortunately, the nonbinaryvalues **of** H result in very high complexity for the checknode updates in the graph, presenting a significant barrierto practical realization. The only hardware implementation inthe literature is fully serial, consisting **of** only one variablenode and one check node [5]. There have been a number **of**approaches to reduce the complexity **of** the check node updatein the literature. MacKay et al. proposed using the fast Fouriertransform (FFT) to convert convolution to multiplication inthe check nodes [6]. Song et al. use the log-domain to replacemultiplication with addition [2]. Declercq et al. introduced theextended min-sum (EMS) algorithm as an approximation tothe SPA, computing likelihood values for only a subset **of**the field elements; thus reducing the number **of** computationsperformed [7]. While these approaches are simpler than adirect implementation **of** the SPA, there is a need to furtherreduce the complexity for practical decoder implementations.Recently, a new approach to decoding binary **LDPC** codesbased on stochastic computation ([8], [9]) was introduced in[10]. **Stochastic** decoders use random bit-streams to representprobability messages and result in simple node hardware andreduced wiring complexity. Subsequently, area-efficient fullyparallelhigh-throughput decoders with performance close tothe SPA were demonstrated in field-programmable gate arrays(FPGAs) [11], [12].We realized that the complexity benefits **of** stochastic decodingmight be even greater for nonbinary **LDPC** codes and couldresult in a practical decoder implementation. In this paper, wepresent a generalization **of** stochastic decoding to **LDPC** codes**over** **GF**(q). The algorithm has significantly lower hardwarecomplexity than other nonbinary decoding algorithms in theliterature.A. NotationII. SUM-PRODUCT DECODINGSince most digital systems transmit data using 2 p symbols,the focus in current research is on codes defined **over** **GF**(2 p ).In this section, we describe the SPA for decoding **LDPC** codes**over** **GF**(2 p ). However, it should be noted that the SPA workson any field **GF**(q) with minor modifications to notation andchannel likelihood calculations.The elements **of** **GF**(2 p ) can be represented as powers**of** the primitive element α, or using polynomials; the latterform∑is used in this section; so that the polynomial i(x) =pl=1 i lx l−1 , where i l are binary coefficients, represents anelement **of** **GF**(2 p ).The notation used in this section for representing internodemessages is similar to that **of** [7]; namely that U andV represent messages heading in the direction **of** check andvariable nodes respectively. The subscripts represent the sourceand destination nodes. For example, U xy is a message fromnode x to node y. All the messages are probability massfunction (PMF) vectors indexed using **GF**(2 p ) elements. Fig.1a shows this notation applied to a Tanner graph.B. AlgorithmWhile nonbinary codes can also be decoded using SPA onTanner graphs, the check node update is modified because the

This convolution represents a significant computational challengein implementing nonbinary **LDPC** decoders.III. STOCHASTIC DECODING(a)Fig. 1: **Stochastic** decoder graphs with X and X −1 denotingforward and inverse permutation operations. (a) message labels,(b) message propagation with EMs added to the decoder.elements **of** H are nonbinary. Therefore the check constraintfor a check node **of** degree d c is:∑d ch k i k (x) = 0, (1)k=1where h k is the element **of** H with indices corresponding tothe check and variable nodes **of** interest. This is different fromthe binary case where the check constraint is ∑ d ck=1 i k(x) = 0.To accommodate this change, Davey et al. [1] assigned valuesfrom H as labels to the edges connecting variable and checknodes and integrated the multiplication into the check nodefunctionality. Declercq et al. [7] introduced a third node typecalled the permutation node which connects variable and checknodes and performs multiplication as shown in Fig. 1a; therefore,reverting the check node constraint to ∑ d ck=1 j k(x) = 0.While the two approaches are functionally equivalent; the onein [7] results in simpler equations and implementation sinceall check nodes **of** the same degree are identical.The first step in the SPA is computing the channel likelihoodvector L v [i(x)] for each variable node v which is computedbased on the channel model and modulation scheme. Theoutgoing message from variable node v to permutation nodez is given by:U vz = L v ×d v∏p=1,p≠z(b)V pv , (2)where × is the term-by-term product **of** vectors and d v isthe∑variable node degree. Normalization is needed so thata∈**GF** (2 p ) U vz[a] = 1.Permutation nodes implement multiplication by an elementfrom H when passing messages from the variable to checknodes, and multiplication by the inverse **of** an element fromH in the other direction. As shown in [7] the multiplicationand multiplication by inverse can performed using cyclic shifts**of** the positions **of** the values in a message vector except thosevalues indexed by 0.The parity check constraint does not include multiplicationby elements **of** H anymore; therefore, the check node updateequation is the convolution **of** incoming messages as shownin [7]:V ct = ⊛ dcp=1,p≠t U pa. (3)A message in the SPA for **LDPC** codes **over** **GF**(q) is avector containing the probabilities **of** each **of** the q possiblesymbols. **Stochastic** decoding uses streams **of** symbols chosenfrom **GF**(q) to represent these messsages; the number **of** occurrences**of** a symbol in a stream divided by the total number**of** symbols observed in the stream gives the probability **of** thatsymbol. The advantage **of** utilizing such a method for messagepassing lies in the simple circuitry required to manipulate thestochastic streams to reflect likelihood changes as presentedin Section III-D. **Stochastic** decoding **of** binary **LDPC** codesresults in simple hardware structures. The reader is referredto [8], [9], [10], [11], [12] for details on binary stochasticdecoding algorithms and their implementation.Similar notation to the SPA is used when describing thestochastic decoding message updates, the difference being thatmessages are serial stochastic streams instead **of** vectors; thus,an index t is used to denote the location **of** a symbol withina stream and the stream name is **over**lined, e.g. U vp (t).A. Node EquationsWinstead et al. [13] presented a stochastic decoding algorithmthat uses streams **of** integers instead **of** the conventionalbinary streams. In that work, an integer stream encodes theprobabilities **of** the states in a trellis, leading to a demonstration**of** trellis decoding **of** a (16,11) Hamming code anda turbo product decoder built from the Hamming componentdecoders. However, that work did not interpret the integersas finite field symbols and did not utilize **GF**(q) arithmetic.In this section we present the node equations for a stochasticdecoder for **LDPC** codes **over** **GF**(q). Taking the view thatthe nonbinary streams are composed **of** finite field elements,we present message update rules that are much simpler thanthose derived from a straightforward application **of** the rules in[13]. In particular, the trellis representation **of** the convolutionin the check node reduces to Galois field addition. SectionIII-E demonstrates the performance **of** the stochastic algorithmwhen decoding a (256,128)-symbol **LDPC** code **over** **GF**(16).Variable Node: A stochastic variable node **of** degree d vtakes as input d v stochastic streams from permutation nodes inaddition to one generated based on channel likelihood values.In [13], the output **of** a node is updated if its inputs satisfysome constraint; otherwise, the output remains unchangedfrom the previous iteration. To implement a variable nodeconstraint on an output message stream at time t, we copythe input symbol to the output symbol if the input symbolson all the other incoming edges are equal at time t. For astochastic variable node with output U vp and inputs V iv , wepropose the following update rule:{ a if VU vp (t) =iv = a, ∀i : i ≠ p(4)U vp (t − 1) otherwise

Using equation (4) and assuming the inputs are independent,the PMF **of** the output is:P [U vp (t) = c] = ∏ P [V iv (t) = c]i+(1 − ∑ ∏P [V iv (t) = a])P [U vp (t − 1) = c]a∈**GF**(q)iAs in [13], if the stochastic streams are assumed to bestationary, then P [U vp (t) = c] = P [U vp (t − 1) = c] andthe PMF **of** U vp (t) becomes:∏P [V iv (t) = c]P [U vp (t) = c] =∑ia∈**GF**(q)(5)∏P [V iv (t) = a] . (6)Equation (6) is identical to the normalized output **of** a sumproductvariable node; therefore, equation (4) is a valid updaterule for the stochastic variable node.Permutation Node: The function **of** the permutation nodeis to remove multiplication by elements **of** H from the checknode constraint. In the sum-product algorithm this is achievedby a cyclic shift **of** the message vector elements as in sectionII-B. Here, we demonstrate that multiplying the stochasticstream from a variable to a check node by an element **of** Haccomplishes the same result. Assuming a permutation nodep which corresponds to h = α i , the permutation node outputmessage in a SPA decoder is defined such that each elementin the message vector is given by:U pc [a] = U vp [a.α i ],i∀ a ∈ **GF**(q).When, in a stochastic decoder, the permutation node multipliesall elements **of** the input stream by h, the output PMFbecomes:P [U pc (t) = a] = P [U vp (t) = a.α i ]The SPA and stochastic output PMFs are identical and sincethe multiplicative group **of** **GF**(q) is cyclic and multiplicationis closed on **GF**(q), the stochastic permutation node operationis equivalent to that **of** the SPA algorithm.Similarly, it can be shown that for messages passed fromcheck to variable nodes, the inverse permutation node operationis multiplication by h −1 . It should be noted that h ≠ 0,since a value **of** 0 in H signifies the lack **of** a connectionbetween a variable and a check node. Therefore, there are nopermutation nodes with a multiplier h = 0.Check Node: When deriving the stochastic update messagefor a check node, a degree-three node is considered and theresult is generalized to a check node **of** any degree. Let U 1c andU 2c be the node inputs, which are assumed to be independent,and V cp its output. From equation (3), the output **of** such anode when using the SPA is given as:P [V cp = z|U 1c , U 2c ] =∑P [U 1c = x]P [U 2c = y], (7)where ⊕ is **GF**(q) addition.x⊕y=zIn the stochastic node, we define the output as the **GF**(q)addition **of** input, i.e V cp (t) = U 1c (t) ⊕ U 2c (t). The PMF **of**the output is computed as:P [V cp (t) = z] = P [U 1c (t) ⊕ U 2c (t) = z] (8)= ∑P [U 1c (t) = x]P [U 2c (t) = y].x⊕y=zThe PMFs (7) and (8) are identical; therefore it is concludedthat **GF**(q) is a valid update message for a degree-3 stochasticcheck node.Since the output **of** a check node can be computed recursively[7], the previous conclusion can be generalized to acheck node **of** any degree, and the output messages for thesenodes are given as:V cp (t) =∑d ci=1,i≠pU ic (t), (9)where the summation is **GF**(q) addition.It can be readily shown that the previous node equationsreduce to the binary ones presented in [10] for **GF**(2).B. Noise-Dependent Scaling and Edge-MemoriesIn binary stochastic decoding the switching activity canbecome very low resulting in poor bit-error-rate performance.This phenomenon is called latch-up and is caused by cyclesin the graph that cause the stochastic streams to become correlatedinvalidating the independent stream assumption usedto derive equations (4) and (9). Two solutions were proposedin [10]: noise-dependent scaling and edge memories. Both **of**these methods are used to improve the performance **of** the**GF**(q) decoder.Noise-dependent scaling increases switching activity byscaling down the channel likelihood values. For example, whentransmitting data using BPSK modulation **over** an AWGNchannel the scaled likelihood **of** each received bit l ′ (i) iscalculated by:l ′ (i) = [l(i)] 2ασ2 nY ,where l(i) is the unscaled bit likelihood, σn2 is the noisevariance, and the ratio α Yis determined **of**fline to yield thebest performance in the SNR range **of** interest. Accordingly theequation for computing the channel likelihood values becomes:p∏L[i(x)] = [l(i k )] 2ασ2 nY . (10)k=1Edge memories (EM) are finite depth buffers inserted betweenvariable nodes and permutation nodes and randomlyreorder symbols in the output streams **of** variable nodes; thus,they break correlation between streams without affecting the**over**all stream statistics. The EM contents are updated withthe variable node output when the node update condition issatisfied, and remain intact otherwise. The output **of** the EMis that **of** the variable node in the first case, or a randomlyselected symbols from its contents in the second. Due to the

Algorithm Multiplication Addition LUTFFT-SPA [2] 2 p (d 2 c + 4d c ) p2 p+1 d c + 2 p 0Log-FFT-SPA [2] 0 (p2 p+1 + 2 p+2 )d c p2 p+1 d cStoc. d c − 1 d c − 1 0Stoc.-LUT 0 d c − 1 d c − 1TABLE I: The number **of** operations needed by FFT-SPA,Log-FFT-SPA, and stochastic decoders to compute a singlecheck node output message including the permutation nodeoperations.memory’s finite length, older symbols are discarded when newones are added.Figure 1b demonstrates the message passing mechanism andthe location **of** edge memories within a stochastic decoder.For complexity comparison, Table I provides the number**of** operations needed to compute a single check node outputmessage in the FFT-SPA and Log-FFT-SPA algorithms aspresented in [2]. It should be noted that the operations for theSPA are for real numbers and quantization will degrade thedecoder performance; while those for the stochastic decoderare **over** a finite field **GF**(2 p ).C. Algorithm DescriptionAt the beginning **of** the algorithm the edge memories areinitialized using scaled channel likelihood values as PMFsfor their content distribution. The following steps describethe stochastic decoding algorithm for each decoding cycle. 1:Variable node messages are computed using equation (4), edgememories are updated where appropriate, and messages aresent from edge memories to permutation nodes. 2: Permutationnodes perform **GF**(q) multiplication on incoming messagesand send the results to check nodes. 3: Check node messagesare computed as in equation (9) and are sent to permutationnodes. 4: Permutation nodes perform **GF**(q) multiplication byinverse and send resulting messages to variable nodes. 5: Eachvariable node contains counters C[a] corresponding to **GF**(q)elements. These counters are incremented based on incomingmessages and the channel message L(t). A variable nodebelief is defined as arg max C[a]. 6: Variable nodes beliefsare updated accordingly.The streams are processed on a symbol-by-symbol basis,one symbol each cycle (steps 1-5), until the algorithm converges(the variable node beliefs satisfy the check constraints)or a maximum number **of** iterations is reached. As in the binaryalgorithm presented in [10] the processing is not packetized.D. ImplementationWhile the stochastic decoding algorithm is defined for anyfinite field; the implementation presented in this section islimited to **GF**(2 p ) as these are the most utilized fields and theyyield the simplest implementation. The polynomial representation**of** **GF**(2 p ) is used when implementing the algorithm.This choice greatly simplifies the circuitry needed to perform**GF**(2 p ) addition. All gate number estimates assume 2-inputlogic gates in a tree configuration.(a) d v = 2 var. node(b) d c = 4 chk. nodeFig. 2: **GF**(8) stochastic elements.Variable Node: To implement the operation specified byequation 4, a **GF**(2 p ) equality check is needed. XNOR gatesand an AND gate are used to perform the check and providean enable (latch) signal to an edge-memory as shown in Figure(2a).To extend the circuit for a higher order field, more XNORgates are used and connected to a larger AND gate. Thisaccommodates the increase in the number **of** bits required torepresent each **GF**(2 p ) symbol in the stochastic streams. Forhigher degree nodes, the number **of** inputs to each XNORgate is increased. The total number **of** gates, without counters,required by a variable node is:[p(d v − 1)XNOR + (p − 1)AND]d v . (11)Each variable node requires a maximum **of** 2 p countersto track occurances **of** each symbol and determine the nodebelief. The size **of** EMs associated with a variable node **of**degree d v is d v lp bits, where l is the EM length.Permutation Node: Permutation nodes can be implementedusing **GF**(2 p ) multipliers. For a particular code, the symbolsarriving at a permutation node are always multiplied by thesame element **of** H. As a result, the multiplier can be designedto multiply by a specific (constant) element **of** **GF**(2 p ) instead**of** a generic **GF**(2 p ) multiplier, significantly reducing circuitcomplexity. Alternatively, look-up tables (LUT) can be usedsince their size would not be large. The multiplication byinverse for messages passed in the other direction is implementedin a similar manner.If LUTs are used to implement multiplication, each noderequires two LUTs: one for multiplication by h and one formultiplication by h −1 . An operation LUT contains 2 p − 1entries each p bits wide.Check Node: The outgoing messages from check nodes are**GF**(2 p ) summations **of** incoming messages. Since the **GF**(2 p )symbols are represented using the polynomial form, thisoperation can be realized utilizing XOR operations betweencorresponding bit lines **of** messages. The circuit in Fig. 2b isan example **of** a degree 4 check node in **GF**(8).To implement a higher degree check node, the number **of**inputs to each XOR gate is increased to account for the extraincoming messages. Extending this circuit to higher orderfields can be done by adding more XOR gates. The totalnumber **of** gates required by a check node is:[p(d c − 1)XOR]d c . (12)

10 -110 0 0 0.5 1 1.5 2 2.5 3 3.5 4SP**Stochastic** DC max = 10 6**Stochastic** DC max = 10 510 -110 0 0 0.5 1 1.5 2 2.5 3 3.5 4SP**Stochastic** DC max = 10 6**Stochastic** DC max = 10 510 -2Frame Error Rate10 -210 -310 -4Bit Error Rate10 -310 -410 -510 -610 -510 -710 -610 -8E b /N 0 (dB)Fig. 3: FER for a (256,128)-symbol (2,4)-regular **LDPC** code**over** **GF**(16). EM length = 50,αY = 0.5.SNR (dB) 2.0 2.5 3.0 3.5 4.0DCavg (DCmax = 10 6 ) 22599 8888 4243 2329 1433DCavg (DCmax = 10 5 ) 17958 8511 4209 2326 1433TABLE II: Average number **of** decoding cycles.E. PerformanceFigures 3 and 4 demonstrate the performance **of** the stochasticdecoder compared to that **of** a SPA decoder when decodinga (256,128)-symbol **LDPC** code **over** **GF**(16) [14], whenusing an AWGN channel, BPSK, and random codewords. TheSPA decoder has a maximum **of** 1000 iterations, while thestochastic decoder’s maximum is 10 6 decoding cycles (DC).The performance **of** the two decoders is very similar and thetwo decoders perform identically for higher SNR values. Thechange in the slope **of** the error rate graph was also observedin [14]. We note that the maximum number **of** decoding cyclesis much greater than the average number **of** decoding cyclesas shown in Table II, with DCavg determining the decoderthroughput. Figures 3 and 4 demonstrate that, at higher SNRs,DCmax can be reduced with a small performance loss.It should be noted that the number **of** iterations in theSPA decoder and decoding cycles in the stochastic decoderare not directly comparable. SPA iterations involve complexoperations, for example, the node operations in EMS [15]involve sorting and iterating **over** incoming message elements;thus, requiring many clock cycles. In a stochastic decoder, adecoding cycle is very simple and can be completed within asingle clock cycle. Also, due to the nature **of** stochastic computation,the proposed implementation lends itself to pipelining(due to the random order **of** the messages, the feedback loopin the graph is broken allowing pipelining [12]); thus, enablingclock rates faster than those possible with the SPA.IV. CONCLUSIONIn this paper we presented a stochastic decoding algorithmwhich we expect to enable practical high-throughput decoding**of** **LDPC** codes **over** **GF**(2 p ).E b /N 0 (dB)Fig. 4: BER for a (256,128)-symbol (2,4)-regular **LDPC** code**over** **GF**(16). EM length = 50,αY = 0.5.ACKNOWLEDGEMENTThe authors would like to thank Pr**of**. D. Declercq fromENSEA for helpful discussions.REFERENCES[1] M. Davey and D. MacKay, “Low-density parity check codes **over****GF**(q),” IEEE Commun. Lett., vol. 2, no. 6, pp. 165–167, 1998.[2] H. Song and J. Cruz, “Reduced-complexity decoding **of** Q-ary **LDPC**codes for magnetic recording,” IEEE Trans. Magn., vol. 39, no. 2, pp.1081–1087, 2003.[3] J. Chen, L. Wang, and Y. Li, “Performance comparison between nonbinary**LDPC** codes and reed-solomon codes **over** noise bursts channels,”in Proc. International Conference on Communications, Circuits and**Systems**, L. Wang, Ed., vol. 1, 2005, pp. 1–4 Vol. 1.[4] I. Djordjevic and B. Vasic, “Nonbinary **LDPC** codes for optical communicationsystems,” IEEE Photonics Technology Letters, vol. 17, no. 10,pp. 2224–2226, 2005.[5] C. Spagnol, W. Marnane, and E. Popovici, “FPGA implementations**of** **LDPC** **over** **GF**(2 m ) decoders,” in Proc. IEEE Workshop on SignalProcessing **Systems**, W. Marnane, Ed., 2007, pp. 273–278.[6] D. MacKay and M. Davey, “Evaluation **of** Gallager codes for short blocklength and high rate applications,” in In **Codes**, **Systems** and GraphicalModels. Springer-Verlag, 2000, pp. 113–130.[7] D. Declercq and M. Fossorier, “**Decoding** algorithms for nonbinary**LDPC** codes **over** **GF**(q),” IEEE Trans. Commun., vol. 55, no. 4, pp.633–643, 2007.[8] B. Gaines, Advances in Information **Systems** Science. Plenum, NewYork, 1969, ch. 2, pp. 37–172.[9] V. Gaudet and A. Rapley, “Iterative decoding using stochastic computation,”Electronics Letters, vol. 39, no. 3, pp. 299–301, Feb. 2003.[10] S. Sharifi Tehrani, W. Gross, and S. Mannor, “**Stochastic** decoding **of****LDPC** codes,” IEEE Commun. Lett., vol. 10, no. 10, pp. 716–718, 2006.[11] S. Sharifi Tehrani, S. Mannor, and W. J. Gross, “An area-efficient FPGAbasedarchitecture for fully-parallel stochastic **LDPC** decoding,” in Proc.IEEE Workshop on Signal Processing **Systems**, 17–19 Oct. 2007, pp.255–260.[12] ——, “Fully parallel stochastic **LDPC** decoders,” IEEE Trans. SignalProcess., vol. 56, no. 11, pp. 5692–5703, Nov. 2008.[13] C. Winstead, V. Gaudet, A. Rapley, and C. Schlegel, “**Stochastic** iterativedecoders,” in Proc. International Symposium on Information TheoryISIT, 2005, pp. 1116–1120.[14] C. Poulliat, M. Fossorier, and D. Declercq, “Design **of** regular (2,d c)-**LDPC** codes **over** **GF**(q) using their binary images,” IEEE Trans.Commun., vol. 56, no. 10, pp. 1626–1635, October 2008.[15] A. Voicila, F. Verdier, D. Declercq, M. Fossorier, and P. Urard, “Architecture**of** a low-complexity non-binary **LDPC** decoder for highorder fields,” in Proc. International Symposium on Communications andInformation Technologies ISCIT ’07, F. Verdier, Ed., 2007, pp. 1201–1206.