13.07.2015 Views

Design strategies for an FPGA-based 256-channel digital down ...

Design strategies for an FPGA-based 256-channel digital down ...

Design strategies for an FPGA-based 256-channel digital down ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

HardwareReconfigurable <strong>FPGA</strong>s<strong>Design</strong> <strong>strategies</strong> <strong>for</strong> <strong>an</strong> <strong>FPGA</strong><strong>based</strong><strong>256</strong>-ch<strong>an</strong>nel <strong>digital</strong><strong>down</strong> converterBy Rodger H. Hosking<strong>FPGA</strong>s c<strong>an</strong> replace traditional ASIC-<strong>based</strong><strong>digital</strong> <strong>down</strong> converters in high ch<strong>an</strong>nelcount Software-Defined Radios. Theirinherent parallelism allows multiple <strong>digital</strong>receiver ch<strong>an</strong>nels per chip, <strong>an</strong>d availableCOTS IP cores c<strong>an</strong> be used to realize up to<strong>256</strong> independently controlled ch<strong>an</strong>nels in aXilinx Virtex family <strong>FPGA</strong>.WideBINComplexInputAs Software-Defined Radio technologyfurther penetrates large communicationsystems <strong>for</strong> battlefield military radionetworks, commercial wireless systems,m<strong>an</strong>ned <strong>an</strong>d unm<strong>an</strong>ned aerial vehicles, <strong>an</strong>dmonitoring facilities <strong>for</strong> SIGINT <strong>an</strong>d COMINT, the need toaccommodate a large number of agile frequency ch<strong>an</strong>nels <strong>for</strong>radio receivers is quite apparent. In each of these applications,the same critical metrics apply: size, weight, power, <strong>an</strong>d cost <strong>for</strong>each receiver ch<strong>an</strong>nel.Traditional Digital Down Converter (DDC) ASIC devicesfeature only one to four ch<strong>an</strong>nels per chip, <strong>an</strong>d straight<strong>for</strong>wardimplementations of DDCs in <strong>FPGA</strong>s consume a signific<strong>an</strong>tpercentage of available resources. A new approach to DDCdesign takes adv<strong>an</strong>tage of the parallelism of <strong>FPGA</strong>s to create ahighly efficient architecture <strong>for</strong> multich<strong>an</strong>nel receivers.Basics of <strong>digital</strong> <strong>down</strong> convertersDDCs, often called <strong>digital</strong> receivers, per<strong>for</strong>m the two essentialsoftware radio functions: frequency tr<strong>an</strong>slation <strong>an</strong>d ch<strong>an</strong>nelfiltering. In a basic DDC shown in Figure 1, a mixer <strong>an</strong>d localoscillator per<strong>for</strong>m the frequency tr<strong>an</strong>slation.The local oscillator consists of a <strong>digital</strong> phase accumulator thatadv<strong>an</strong>ces each clock by a programmable increment equal to thetuning frequency. The phase accumulator is a register whose fullscalevalue represents 360 degrees of a sinusoid. A sine/cosinelookup table converts the phase <strong>an</strong>gle of the accumulator to the<strong>digital</strong> voltage value of the sinusoid. The higher the increment,the faster the phase accumulator steps through the sine table. Itnaturally overflows at the top, preserving <strong>an</strong>y residue left in theregister as a phase offset <strong>for</strong> the first sample of the next cycle.As a result, the output sinusoid is directly proportional to thephase increment or frequency setting. This block is a classicNumerically Controlled Oscillator (NCO), also often called aDirect Digital Synthesizer (DDS).IQ1616DigitalLocalOscillatorFrequency Tr<strong>an</strong>slationComplex MixerSin/Cos LUTPhase AccumTuning FrequencyIQFIR LowpassComplex FilterCoefficient LUTFilter CoefficientsFigure 1Ch<strong>an</strong>nel FilterDecimator& FormatterThe mixer consists of two <strong>digital</strong> multipliers that accept complexsine/cosine outputs from the local oscillator <strong>an</strong>d <strong>digital</strong> samplesof the receiver input signal produced by <strong>an</strong> A/D converter.Multiplication in the time domain produces a sum <strong>an</strong>d differencesignal in the frequency domain. If the local oscillator is set to thefrequency of the input signal of interest, the difference term willbe that input signal tr<strong>an</strong>slated <strong>down</strong> to 0 Hz. Since the mixer iscomplex, the upper <strong>an</strong>d lower sideb<strong>an</strong>ds of the input signal willbe tr<strong>an</strong>slated to negative <strong>an</strong>d positive frequencies centered at 0Hz.The filter is a complex low-pass <strong>digital</strong> filter with two parallel I<strong>an</strong>d Q arms whose coefficients are programmed <strong>for</strong> a pass b<strong>an</strong>dequal to the ch<strong>an</strong>nel b<strong>an</strong>dwidth. Because the output of the filter isb<strong>an</strong>dlimited, the output decimation stage c<strong>an</strong> drop the samplingrate accordingly.DDCs are grouped into two main categories. Wideb<strong>an</strong>dDDCs have output ch<strong>an</strong>nel b<strong>an</strong>dwidths typically above1 MHz <strong>an</strong>d are appropriate <strong>for</strong> wideb<strong>an</strong>d Code Division MultipleAccess (CMDA) <strong>an</strong>d radar applications. Narrowb<strong>an</strong>d DDCs withb<strong>an</strong>dwidths below 1 MHz are widely used <strong>for</strong> Frequency DivisionMultiplexed (FDM) systems including voice <strong>an</strong>d music ch<strong>an</strong>nels<strong>for</strong> telecom <strong>an</strong>d commercial broadcast systems. While the mixer<strong>an</strong>d local oscillator sections are quite similar <strong>for</strong> all DDCs, thebest filter design depends on the filter b<strong>an</strong>dwidth. For wideb<strong>an</strong>dch<strong>an</strong>nels, a conventional FIR filter is best (as shown in Figure1). For narrowb<strong>an</strong>d ch<strong>an</strong>nels, a multistage Cascaded Integrator-Comb (CIC) filter followed by <strong>an</strong> FIR to correct frequency droopis more efficient.For narrowb<strong>an</strong>d applications, both ASIC <strong>an</strong>d <strong>FPGA</strong> IntellectualProperty (IP) cores are available using CIC filter designs.IQDecimationFactor &Output ModeReprinted from Military Embedded Systems Summer 2006 Copyright 2006IQRealBaseBINDigitalOutputs


HardwareReconfigurable <strong>FPGA</strong>sCommercial ASICs feature as m<strong>an</strong>y as four ch<strong>an</strong>nels per chip,like the popular Texas Instruments/Graychip GC4016.IP core DDCs, like the LogiCore DDC from Xilinx <strong>for</strong> itsVirtex-II Pro, c<strong>an</strong> be scaled <strong>for</strong> various levels of Spurious-Free Dynamic R<strong>an</strong>ge (SFDR) per<strong>for</strong>m<strong>an</strong>ce to use more or lessof the available resources. For example, a complex DDC with84 dB SFDR consumes approximately 1,700 slices. In a mid-sized<strong>FPGA</strong> device with 24,000 available slices, only about 14 DDCch<strong>an</strong>nels c<strong>an</strong> be accommodated. For applications requiringseveral dozen or even hundreds of ch<strong>an</strong>nels, this approach c<strong>an</strong>become impractical.Ch<strong>an</strong>nelizersBecause of the extremely fine resolution of its NCO tuningfrequency, a true DDC c<strong>an</strong> tr<strong>an</strong>slate <strong>an</strong>y input frequencycomponent <strong>down</strong> to 0 Hz, often with 32-bit accuracy. This abilitymakes DDCs ideal <strong>for</strong> applications that require precise ch<strong>an</strong>gesin tuning such as in continuous Doppler correction <strong>for</strong> satellitetracking systems.However, in other applications, a ch<strong>an</strong>nelizer approach maybe sufficient. This is a b<strong>an</strong>k of equally spaced, fixed frequencyb<strong>an</strong>d pass filters whose outputs are tr<strong>an</strong>slated to baseb<strong>an</strong>d (0 Hz).One crude example of a ch<strong>an</strong>nelizer familiar to everyone is asimple FFT. It converts a block of N time samples equally spacedin time into block of N frequency samples equally spaced infrequency. For a continuous stream of input time sample blocks,samples at a given point in successive output blocks represent atr<strong>an</strong>slated, b<strong>an</strong>d pass frequency signal or bin.By selecting the output of a particular bin, a ch<strong>an</strong>nelizer c<strong>an</strong> serveas a primitive DDC, but with extremely coarse tuning resolutiondetermined by the number of points in the FFT, as shown inFigure 2.Another serious limitation of the FFT as a DDC is the frequencyresponse (pass b<strong>an</strong>d flatness) of the bin, <strong>an</strong>d rejection of energyfrom adjacent bins (stop b<strong>an</strong>d rejection). Other ch<strong>an</strong>nelizerdesigns use various <strong>digital</strong> filtering techniques to split the b<strong>an</strong>dswith better flatness <strong>an</strong>d adjacent ch<strong>an</strong>nel rejection, but theyusually require signific<strong>an</strong>tly more hardware th<strong>an</strong> <strong>an</strong> FFT <strong>for</strong> aBIN 1BIN 2BIN 3comparable number of bins. Regardless of its design, the tuningresolution of <strong>an</strong>y ch<strong>an</strong>nelizer is simply equal to the number ofbins or ch<strong>an</strong>nel filters. As a result, ch<strong>an</strong>nelizers may be useful<strong>for</strong> spectrum <strong>an</strong>alyzers, sc<strong>an</strong>ners, <strong>an</strong>d energy survey equipmentbut they are rarely used as substitutes <strong>for</strong> DDCs in software radiocommunication systems.Rethinking the multich<strong>an</strong>nel DDCThe software radio market generates a growing number ofrequests <strong>for</strong> DDC solutions with densities higher th<strong>an</strong> the 16 or32 ch<strong>an</strong>nels provided per board using ASICs or st<strong>an</strong>dard <strong>FPGA</strong>designs. There<strong>for</strong>e, we embarked upon a mission to develop asignal processing architecture <strong>for</strong> a narrowb<strong>an</strong>d DDC with 64ch<strong>an</strong>nels or more, with full tuning resolution, but with muchmore efficient use of <strong>FPGA</strong> resources th<strong>an</strong> deploying a farm ofconventional DDC cores.Each conventional DDC requires its own local oscillator (phaseaccumulator <strong>an</strong>d sine table), mixer (two multipliers), <strong>an</strong>d FIRfilter (multipliers <strong>an</strong>d accumulators). All of this hardware mustoperate at the full input sample clock rate, <strong>an</strong>d clock rates <strong>for</strong>A/Ds commonly used in software radios r<strong>an</strong>ge between 100<strong>an</strong>d 200 MHz. Since this is the same clock r<strong>an</strong>ge rating <strong>for</strong>commercial DDC IP cores, all of the hardware resources used <strong>for</strong>each ch<strong>an</strong>nel must be dedicated to that ch<strong>an</strong>nel.However, imagine that the input data sample rate is reduced bya factor N. By operating the DDC hardware resources required<strong>for</strong> one ch<strong>an</strong>nel at the full clock rate, those same resources c<strong>an</strong>then be multiplexed (time shared) across N ch<strong>an</strong>nels. Of course,provisions must be made <strong>for</strong> buffering the data <strong>for</strong> all ch<strong>an</strong>nelswhile multiplexing. This is usually done in RAM or in delaymemory, a common feature in <strong>FPGA</strong>s.One way to achieve this input rate reduction is to split theinput signal into a b<strong>an</strong>k of N adjacent frequency b<strong>an</strong>ds using ach<strong>an</strong>nelizer. Then, the output sample rate <strong>for</strong> each b<strong>an</strong>d c<strong>an</strong> bereduced by a factor of N. The output from the b<strong>an</strong>d containing thesignal of interest c<strong>an</strong> be selected as the input to <strong>an</strong>y given DDC tofine tune within that b<strong>an</strong>d.The tradeoff question becomes: Are the resources freed up bymultiplexing the DDCs more th<strong>an</strong> theresources required <strong>for</strong> the ch<strong>an</strong>nelizer? The<strong>an</strong>swer lies in how efficient the ch<strong>an</strong>nelizerc<strong>an</strong> be.INPUTSample Rate= FsFFT1024POINTSSample Rate= Fs/1024AmplitudeRealizing the designFigure 3 shows <strong>an</strong> <strong>FPGA</strong>-<strong>based</strong> <strong>256</strong>-ch<strong>an</strong>nelDDC IP core that combines a ch<strong>an</strong>nelizerstage with a multiplexed DDC stage.BIN 1022BIN 1023BIN 1024Figure 2BINnBINn+1BINn+2BINn+3FreqThe crucial part of this design is thech<strong>an</strong>nelizer stage. It accepts a singlewideb<strong>an</strong>d input stream <strong>an</strong>d delivers ach<strong>an</strong>nel b<strong>an</strong>k of 1,024 output b<strong>an</strong>ds equallyReprinted from Military Embedded Systems Summer 2006 Copyright 2006

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!