Design and Implementation of a Structured LDPC Decoder on FPGA*

Design and Implementation of a Structured LDPC Decoder on FPGA*

ong>Designong> ong>andong> ong>Implementationong> ong>ofong> a ong>Structuredong> ong>LDPCong> ong>Decoderong> on FPGA*

Jason Kwok-San Lee, Benjamin Lee, Jeremy Thorpe,

Kenneth Andrews, Sam Dolinar, Jon Hamkinst

{ kwoklee, leeb, jeremy }@caltech. edu { ong>andong>rews, Sam, hamkins} @Shannon. j pl .nasa. gov

Jet Propulsion Laboratory, California Institute ong>ofong> Technology


We present a design framework €or the implementation for a Low-Density Parity-Check (ong>LDPCong>) decoders. We

employed a scalable decoding architecture for a certain class ong>ofong> structured ong>LDPCong> codes. The codes are designed

using a small (n, r) protograph that is replicated 2 times to produce a decoding graph for a (2 x n, 2 x r) code.

Using this architecture, we have implementated a decoder for a (4096,2048) ong>LDPCong> code on a Xilinx Virtex-I1

2000 FPGA, ong>andong> achieved decoding speeds ong>ofong> 31 Mbps with 10 fixed iterations. The implemented messagepassing

alogrithm uses an optimized 3-bit non-uniform quantizer that operates with 0.2dB implementation loss

relative to a floating point decoder.

1 Introduction

Error correcting codes are widely used in digital commu-

nications to allow effectively error-free communications

to occur over noisy channels. Low-Density-Parity-Check

(ong>LDPCong>) codes(l] have recently received a lot ong>ofong> atten-

tion because ong>ofong> their excellent error-correcting capabil-

ity ong>andong> highly parallelizable decoding algorithm. ong>LDPCong>

codes have been shown to be able to perform close to

the Shannon limit[2]. They also can achieve very high

throughput because ong>ofong> the parallel nature ong>ofong> their de-

coding algorithms. In the past decade or so, much ong>ofong>

the research on ong>LDPCong> codes has focused on the analy-

sis ong>andong> improvement ong>ofong> codes under decoding algorithms

with floating point precision. However, to make ong>LDPCong>

codes practical in the real world, the design ong>ofong> an efficient

hardware architecture is crucial.

There are several reasons to explore the implementa-

tion ong>ofong> ong>LDPCong> code using reconfigurable hardware. First,

research on constructing a good ong>LDPCong> code involves em-

pircial testing ong>ofong> various algorithm parameters ong>andong> must

be verified by simulations. Running simulations in song>ofong>t-

ware is a slow process. Reconfigurable hardware would

provide a good way to speed up these simulations. Sec-

ond, once verified, the same algorithms could be em-

ployed using the same reconfigurable hardware for proto-

types. In the future, error correcting system can be easily

upgraded to the same design with a larger block length

thus better performance, once a newer ong>andong> larger FPGA

is on the market. Third, communication stong>andong>ards in-

volving the use ong>ofong> capacity approaching codes are still

evolving. By implementation in reconfigurable hard-

ware, communication systems would have the flexibility

to adopt the new stong>andong>ards when they are developed ong>andong>

comply with stong>andong>ards in use in different geographic lo-

cations. Fourth, reconfigurable hardware allows com-

munication systems to adaptively switch between differ-

ent codes adjusting to the noise environments, power re-

quirements, data block lengths, ong>andong> other variable para-


ong>LDPCong> codes are ong>ofong>ten specified by bipartite graphs

consisting ong>ofong> two kinds ong>ofong> processing nodes, called vari-

able nodes ong>andong> check nodes. Each variable node is con-

nected to channels to receive transmitted bits; each check

node represents a parity check constraint. Each variable

nodes is connected to a few check nodes through a set ong>ofong>


In general, the construction ong>ofong> a good ong>LDPCong> code

requires a large number ong>ofong> nodes with disorganized in-

terconnections, which complicates the implementation

ong>ofong> due to long global interconnects. In this work, we

propose a scalable architecture ong>ofong> a structured ong>LDPCong>

~~ ~

*The work described was funded by the IND Technology Program ong>andong> performed at the Jet Propulsion Laboratory, California

Institute ong>ofong> Technology, under contract with the National Aeronautics ong>andong> Space Administration.

t Communications Systems & R.esearch Section, Jet Propulsion Laboratory, California Institute ong>ofong> Technology, Pasadena, CA.


jth check node (see figure 2).

FPGA Top Level Block Diagram for

(n.0 x z wpies


check . . . check

. Figure 2. Our FPGA decoder architecture

2.3 Computation Scheduling

The cornerstone ong>ofong> our hardware architecture is the

scheduling ong>ofong> message-updates in space ong>andong> time. One

iteration consists ong>ofong> a check node phase, followed by a

variable node phase. In each phase, there are 2 compu-

tation cycles (See figure 3).

In the check node phase, all check node modules

read messages from the edge memory in ascending or-

der, update the messages, ong>andong> write their results back to

the edge memory in ascending order. This computation

across all T check node units occurs in parallel.



Check Node Phase

(IWp lhmugh each ong>ofong> Z

Variable Node Phase

(loop through each ong>ofong> Z

Decision Phase

(loop ong>andong> outpul each wpy'r


oop up o aximum lferations

Figure 3. ong>Decoderong> control sequence

In the variable node phase, all variable node modules

read messages from the edge memory in permuted order,

update the messages, ong>andong> write back the edge memory in

permuted order. The computation across all n variable

node units also occurs in parallel. The decoding stops at

the maximum iteration number, or when a stopping rule

is satisfied.

Although this work was underway before the Flar-

ion decoder patent was published, we can now make a

useful comparison to that architecture. Flarion's de-

sign operates on all 2 copies ong>ofong> the template ong>LDPCong>

graph in parallel ong>andong> processes the individual nodes

serially. In this manner, memory ong>andong> processing can

be centralized ong>andong> a Single-Instruction-stream-Multiple-

Data-stream (SIMD) instruction is used to access all 2


In contrast, our system has multiple decentralized

processing elements with multiple separate memories.

All nodes in the template ong>LDPCong> graph are operated on

simultaneously in parallel ong>andong> each ong>ofong> the 2 copies are

processed serially (see figure 4).

II -4 I


II -1




Xilinx Virtex-I1 2000 FPGAs provide fully synchro-

nous dual-port Block RAM for memory use. However,

using Block RAM in our design led to long routes from

the Block RAMS to the processing logic, which increased

the maximum delay unaccepted. Instead, we opted

to use distributed RAM in our ong>LDPCong> decoder design.

All n variable node units, r check node units, ong>andong> edge

memory units could therefore be placed tightly over the

FPGA area (See figure 8, 9, 10). As a result, routes

between the adjacent logic ong>andong> memory are minimized,

making it easier for the placement ong>andong> routing tool to

meet timing requirements.


1 I . **.

.Ij ""*I *: I

e,., ' . ...

*I. 1 .

, 4. - I*

.> .

.* I . I .

.,.% *r*

%.l' .a* f 11 I e,-* ..I ,ZL .I 1**, rvr. . . .. (I.

.I % A *



.I .

. I *' .


Figure 8. Distributed location ong>ofong> n variable nodes' logic

over the FPGA area



I.*,. .ft' 1 ', .

1 "*,. 1111. I

1 ***

6 2'" , ., **e

:' .. '. '" .=. .,:* *?

L .

" ,... 111,

+* 11111.1 *

1, **ne


. * I ) I


8 .*

* **. ', '* .' :

e** *

?' 6 Performance

* *.

. I '.'

1 ::

J. Thorpe "Low-Density Parity-Check (ong>LDPCong>)

Codes Constructed from Protographs", IPN

Progress Reports 42-154, April-June 2003

T. Richardson, "Methods ong>andong> Apparatus for Decod-

ing ong>LDPCong> Codes," United States Patent No.: US

6,633,856 B2, Oct. 14, 2003.

H. Zhong ong>andong> T. Zhong, "ong>Designong> ong>ofong> VLSI

ong>Implementationong>-Oriented ong>LDPCong> Codes," IEEE 1'1

Semiannual Vehicular Technology Conference

(VTC), Oct. 2003

J. Thorpe, K, Andrews, S. Dolinu, "Methodolo&s

for ong>Designong>ing ong>LDPCong> Codes Using Protographs,"



submitted to 2004 IEEE International Symposium

on Information Theory.

X. Hu, E. Eleftheriou, ong>andong> D. Arnold, "Progressive

edge-growth Tanner graphs, 'I Global Telecommuni-

cations Conference, 2001. GLOBECOM '01. IEEE

, Volume: 2 , 25-29 Nov. 2001

J. Thorpe I' Low-Complexity Approximations to Be

lief Propagation for ong>LDPCong> Codes," Unpublished.

[lo] E. Yeo, B.Nikolic, ong>andong> V. Anantharam, "Itera-

tive ong>Decoderong> Architectures," Communications Mag-

azine, IEEE , Volume: 41 , Issue: 8 , Aug. 2003.

More magazines by this user
Similar magazines