15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

overflows of intermediate sums. That is, as the final result is correct even though intermediate sums may<br />

be flawed.<br />

For cases where higher dynamic ranges are needed, floating point solutions are employed. The floating<br />

point representation of a real number X is given by X~(−1) S Mr E , where M is the mantissa, r the radix,<br />

E is the signed exponent, and S is the sign bit. The mantissa is usually normalized to a value 1/r ≤ M < 1,<br />

and the format is defined by published standards (e.g., IEEE). A variation on the floating-point theme<br />

is called block floating point, a format used by a number of DSP chips, especially FFTs. A block floating<br />

point representation of an array of numbers {x[k]} is defined in terms of a maximum exponent E, where<br />

|x[k]| max = r E . A block floating point representation of the number x[k] is given by x[k] = ±M[k]r E , where<br />

E is the fixed maximum exponent and M[k] is a fractional mantissa (M[k] ≤ 1). Since the scale factor r E<br />

is known a priori, it need not be explicitly carried in number system representation.<br />

The primary DSP arithmetic operation is the signed multiply accumulation (MAC). Fixed-point<br />

multipliers cover a wide range of speed, precision, and complexity tradeoffs. Compact low-complexity<br />

MACs can be designed using ripple adders. When adder area and power dissipation are not an issue,<br />

carry-lookahead adders can be used to accelerate wide wordlength adders and, therefore improve MAC<br />

speed. Carry-save adders (modified full adders) can also be an important element in implementing fast<br />

multipliers. Another fast multiplier architecture is based on Booth’s algorithm and interprets strings of<br />

consecutive of “ones” as multiplicative NO-OP operations. Fast multipliers can also be constructed using<br />

arrays of small wordlength multipliers. These architectures are referred to as cellular array multipliers,<br />

or simply array multipliers.<br />

General-purpose programmable DSP µps make use of multipliers that map X∗Y → P, where X and Y<br />

are variables. Most DSP applications, however, are SAXPY (S = A∗X + Y) intensive, which refers to<br />

multiplying a variable X by a constant A (e.g., filter coefficients), followed by an accumulation. Implementing<br />

SAXPY algorithms technically does not require general multiplication but rather an operation<br />

called scaling. Several techniques have been developed to exploit scaling in the implementation of DSP<br />

algorithms. They are particularly useful in implementing fixed-coefficient DSP algorithms with application<br />

specific integrated circuits (ASIC), application specific standard parts (ASSP), and field-programmable<br />

gate-arrays (FPGA) devices. One scaling technique is called the reduced adder graph (RAG) method. RAG<br />

arithemtic is based on the theory of the ternary-valued ({0, ±1}) canonical sign-digit numbers (CSD).<br />

For example, the 4-bit binary unsigned representation of the number 15 is 15 10 ↔ 1111 2, while the RAG<br />

representation is given by 15 10 = 16 10 − 1 10 ↔ 1001 RAG, which can be implemented using one adder and<br />

a shift register. The cost of an RAG multiplier is measured in terms of the number of adders needed to<br />

complete a design. Another scaling method is called distribute arithmetic (DA) and is applicable only to<br />

the implementation of constant DSP coefficient algorithms. As a point of reference, an Nth order FIR<br />

digital filter, having known coefficients h r, r ∈ [0, N), requires N MAC operations be performed per cycle.<br />

The data is assumed to be coded as an M-bit 2C word, where<br />

where x[k:i] is the ith-bit of sample x[k]. The output y[k] is given by<br />

where the mappings θ[ [k]:i] are implemented using 2 N x<br />

-word memory lookups. The lookup table θ<br />

maps an array of binary valued digits x[k:i] = { x[k:i],<br />

x[k − 1:i],…,x[k − M−1:i]}, taken from the ith<br />

© 2001 by CRC Press LLC<br />

x[ k]<br />

x[ k:0]<br />

x[ k:1]2<br />

1 – … x[ k:N– 1]2<br />

N−1 –<br />

= – + + +<br />

N−1<br />

∑<br />

M−1<br />

∑<br />

y[ k]<br />

hix[ k– r:0]<br />

2 i –<br />

= –<br />

+<br />

=<br />

r=0<br />

M−1<br />

∑<br />

θ[ x[ k]:0]<br />

2 i<br />

+<br />

i=1<br />

i=1<br />

– θ x k<br />

N−1<br />

∑<br />

r=0<br />

[ [ ]:i]<br />

( )<br />

hr x[ k– r:i]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!