Embedded Computing Design - OpenSystems Media

More documents

Recommendations

Info

TIPS By Eric Cigan In this article, Eric gives an overview of the benefits of using FPGAs in DSP design and concludes with a list of recommended design rules. Challenges of FPGA-based DSP design Not long ago, designers of highperformance, digital signal processing systems (DSPs) had two alternatives for implementation – general purpose DSPs or ASICs. General-purpose DSPs, such as those from TI, Agere, Motorola, and Analog Devices are special-purpose microprocessors optimized for common DSP operations. The benefit of generalpurpose DSPs is that they are the fastest method to get an algorithm running because they offer a comprehensive development environment, with tools for code analysis, debugging, and rapid prototyping. The disadvantage of DSPs is that ultimately they execute instructions serially, setting an upper limit on the chip’s throughput. ASICs offer the ability to break through these performance limitations. Custom ASIC design lets the designer employ the optimal mix of resources on a chip and to place them in close physical proximity to minimize delays. Moreover, ASICs are ideal for use in portable electronics since the flexibility of ASIC design allows the use of processes and architectures optimized for lower power consumption. The drawbacks of ASICs are considerable: ■ They need to be fabricated. ■ They require more time to design. ■ They require more complex and expensive development tools. ■ A single design flaw could lead to a design respin causing additional cost and delay. In the past, given these two choices, most designers avoided ASICs unless absolutely necessary. Two trends have recently changed the landscape. First, the demand for high-performance DSPs has increased dramatically due in part to the growth in multimedia and communications systems. Products as diverse as 3G wireless base stations, medical diagnostic imaging equipment – even driver-assist systems that will automatically park a car – would be inconceivable without the use of advanced DSP algorithms. The throughput requirements of these systems has strained the abilities of general-purpose DSPs. For example, one leading manufacturer of advanced echo cancellation systems incorporated more than 25 generalpurpose DSPs on a single board to meet their performance goals. A new generation of programmable chips has emerged as an alternative to standard DSPs. Platform FPGAs such as Altera’s Stratix II and Xilinx’s Virtex II, incorporate arrays of dedicated multipliers, embedded memory, and high-speed I/O that make them ideal for DSP applications. The RSC #30 @ www.embedded-computing.com/rsc 30 / Summer 2004 Embedded Computing Design
silicon resources of an FPGA lead to staggering performance gains – while the fastest general purpose can deliver up to 5 billion MAC/s (multiply-accumulate per second), leading FPGA devices can deliver more than 500 billion MAC/s – that’s more than 100x faster. What’s more, channelized applications such as those common to wireless communications naturally lend themselves to parallel implementations in FPGAs. Growth rates in processing speed requirements versus capabilities are shown in Figure 1 1 . A comparison of general purpose DSPs versus FPGAs is shown in Table 1 2 . Tips for DSP design Let’s now take a close look at how designers navigate the challenges of using FPGAs in design to avoid prolonged design cycles or reduce the component cost for end products. It comes down to the following basic rules. 1 Figure 1 Rule #1 Start at the beginning. Complex DSP designs start with an algorithm developer who creates the initial design based on existing designs and experience. According to the DSP market research firm Forward Concepts, the leading tool for algorithm design is MATLAB from MathWorks. Using the MATLAB language, algorithm developers can create designs in a natural and productive form and may tap into an immense wealth of designs, scripts, and engineering knowhow available only in the MATLAB language. Though designers can choose from other options including blocklevel environments, such as Simulink or SPW, or languages based on C/C++, these environments are less widely used and there may not be as many designs available for them. Moreover, many constructs used in DSP designs – such as looping, repeated structures, and 2- or 3- dimensional data arrays – are much easier to represent in MATLAB than in blocklevel environments. Once the algorithm is created in MATLAB, it can be readily shared or partitioned across a design team and reused over time. 2 Rule #2 Avoid recopying your work (or alternatively, “Don’t get lost in translation”). Once the algorithm is available, the rest of the design team, including hardware designers, software developers, and system designers who integrated the design components, swings into motion. In the past, the completed algorithm in MATLAB became the executable specification, which meant that the hardware designer and software developers needed to recreate the design. Many embedded systems developers are accustomed to implementing DSP algorithms on general purpose DSPs in C or assembly language. This puts hardware and software engineers into the role of translating designs from one language to another, creating many opportunities for inserting errors with the attendant debugging process. To avoid this process altogether, companies are looking to architectural synthesis tools that use the MATLAB M-file as the golden source for downstream design, automatically synthesizing the design at the Register Transfer Level (RTL). Coupled with traditional RTL synthesis tools that can synthesize RTL to gate-level implementation, this establishes an unbroken design flow from algorithmic creation to hardware implementation. The top-down design process is shown in Figure 2. Function Industry leading, general purpose, DSP processor core Industry leading platform FPGA 8x8 Multiply Accumulate 4.8 Billion MACps 1 Trillion MACps (MAC) f clk = 600 MHz f clk = 300 MHz FIR Filter – 256 Taps, Linear phase 9.3 Msps 300 Msps – 16-bit data/coefficients f clk = 600 MHz f clk = 300 MHz Complex FFT 10 µs 1 µs – 1024 point, 16-bit data f clk = 600 MHz f clk = 150 MHz Viterbi decoding 500 channels at 7.95 Kbps 155 Mbps (OC-3 rates) throughput for a total of 3.9 Mbps Reed-Solomon decoding 4.1 Mbps 10 Gbps (OC-192 rates) throughput f clk = 600 MHz f clk = 85 MHz Turbo convolutional Six 2 Mbps data streams 5.4 Mbps decoder throughput (6 iterations) (6 iterations) Table 1 Figure 2 Embedded Computing Design Summer 2004 / 31
Page 2 and 3: RSC #2 @ www.embedded-computing.com
Page 4 and 5: W W W. E M B E D D E D - C O M P U
Page 6 and 7: W W W. E M B E D D E D - C O M P U
Page 8 and 9: icrutcher@opensystems-publishing.co
Page 10 and 11: Systran Corporation’s FibreXtreme
Page 12 and 13: ARM System Developer’s Guide By A
Page 14 and 15: By Markus Levy Standardizing a powe
Page 16 and 17: Zigbee Alliance Networking with Zig
Page 18 and 19: RSC #18 @ www.embedded-computing.co
Page 20 and 21: InfiniBand and PCI Express subsyste
Page 22 and 23: offers an InfiniBand 4x Dual-Port P
Page 26 and 27: Because Infiniband can incorporate
Page 28 and 29: Hardware-based error detection and
Page 32 and 33: 3Rule #3 Always check your work - u
Page 37 and 38: still operates on a flat netlist. W
Page 39 and 40: field-proven, since it is unlikely
Page 41 and 42: The same reuse methodology also com
Page 43 and 44: high programmable processing perfor
Page 45 and 46: functionality of the system is desi
Page 47 and 48: and communication features through
Page 49 and 50: By Kevin C. Kreitzer, Alan Kasten N
Page 51 and 52: “A number of additional interface
Page 55 and 56: Blades - Servers - VoIP By Chad Lum
Page 57 and 58: Company Name/Model No. Description
Page 59 and 60: RSC #5901 @ www.embedded-computing.
Page 61 and 62: By Eli Shapiro TABLE OF CONTENTS AR
Page 63 and 64: GATEWAYS FieldServer Website: www.f
Page 65 and 66: microprocessor • 512 MB to 1 GB S

Embedded Computing Design - OpenSystems Media

Create successful ePaper yourself

Delete template?

Save as template?