04.01.2015 Views

Embedded Computing Design - OpenSystems Media

Embedded Computing Design - OpenSystems Media

Embedded Computing Design - OpenSystems Media

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

silicon resources of an FPGA lead to<br />

staggering performance gains – while the<br />

fastest general purpose can deliver up to<br />

5 billion MAC/s (multiply-accumulate<br />

per second), leading FPGA devices can<br />

deliver more than 500 billion MAC/s –<br />

that’s more than 100x faster. What’s more,<br />

channelized applications such as those<br />

common to wireless communications<br />

naturally lend themselves to parallel implementations<br />

in FPGAs. Growth rates<br />

in processing speed requirements versus<br />

capabilities are shown in Figure 1 1 . A<br />

comparison of general purpose DSPs<br />

versus FPGAs is shown in Table 1 2 .<br />

Tips for DSP design<br />

Let’s now take a close look at how designers<br />

navigate the challenges of using FPGAs in<br />

design to avoid prolonged design cycles<br />

or reduce the component cost for end<br />

products. It comes down to the following<br />

basic rules.<br />

1<br />

Figure 1<br />

Rule #1<br />

Start at the beginning.<br />

Complex DSP designs start with an<br />

algorithm developer who creates the initial<br />

design based on existing designs and<br />

experience. According to the DSP market<br />

research firm Forward Concepts, the leading<br />

tool for algorithm design is MATLAB<br />

from MathWorks. Using the MATLAB<br />

language, algorithm developers can create<br />

designs in a natural and productive form<br />

and may tap into an immense wealth of<br />

designs, scripts, and engineering knowhow<br />

available only in the MATLAB<br />

language. Though designers can choose<br />

from other options including blocklevel<br />

environments, such as Simulink<br />

or SPW, or languages based on C/C++,<br />

these environments are less widely used<br />

and there may not be as many designs<br />

available for them. Moreover, many<br />

constructs used in DSP designs – such as<br />

looping, repeated structures, and 2- or 3-<br />

dimensional data arrays – are much easier<br />

to represent in MATLAB than in blocklevel<br />

environments. Once the algorithm<br />

is created in MATLAB, it can be readily<br />

shared or partitioned across a design team<br />

and reused over time.<br />

2<br />

Rule #2<br />

Avoid recopying your work<br />

(or alternatively, “Don’t get<br />

lost in translation”).<br />

Once the algorithm is available, the rest<br />

of the design team, including hardware<br />

designers, software developers, and system<br />

designers who integrated the design<br />

components, swings into motion. In the<br />

past, the completed algorithm in MATLAB<br />

became the executable specification,<br />

which meant that the hardware designer<br />

and software developers needed to recreate<br />

the design.<br />

Many embedded systems developers<br />

are accustomed to implementing DSP<br />

algorithms on general purpose DSPs in C<br />

or assembly language. This puts hardware<br />

and software engineers into the role of<br />

translating designs from one language to<br />

another, creating many opportunities for inserting<br />

errors with the attendant debugging<br />

process. To avoid this process altogether,<br />

companies are looking to architectural<br />

synthesis tools that use the MATLAB<br />

M-file as the golden source for downstream<br />

design, automatically synthesizing the<br />

design at the Register Transfer Level<br />

(RTL). Coupled with traditional RTL<br />

synthesis tools that can synthesize RTL to<br />

gate-level implementation, this establishes<br />

an unbroken design flow from algorithmic<br />

creation to hardware implementation.<br />

The top-down design process is shown in<br />

Figure 2.<br />

Function<br />

Industry leading, general<br />

purpose, DSP processor core<br />

Industry leading<br />

platform FPGA<br />

8x8 Multiply Accumulate 4.8 Billion MACps 1 Trillion MACps<br />

(MAC) f clk = 600 MHz f clk = 300 MHz<br />

FIR Filter<br />

– 256 Taps, Linear phase 9.3 Msps 300 Msps<br />

– 16-bit data/coefficients f clk = 600 MHz f clk = 300 MHz<br />

Complex FFT 10 µs 1 µs<br />

– 1024 point, 16-bit data f clk = 600 MHz f clk = 150 MHz<br />

Viterbi decoding 500 channels at 7.95 Kbps 155 Mbps (OC-3 rates)<br />

throughput<br />

for a total of 3.9 Mbps<br />

Reed-Solomon decoding 4.1 Mbps 10 Gbps (OC-192 rates)<br />

throughput f clk = 600 MHz f clk = 85 MHz<br />

Turbo convolutional Six 2 Mbps data streams 5.4 Mbps<br />

decoder throughput (6 iterations) (6 iterations)<br />

Table 1<br />

Figure 2<br />

<strong>Embedded</strong> <strong>Computing</strong> <strong>Design</strong> Summer 2004 / 31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!