ZTE Communications
ZTE Communications
ZTE Communications
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Slice Registers<br />
No. of Resources<br />
600<br />
500<br />
400<br />
300<br />
200<br />
100<br />
14<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
0<br />
: LPA Block RAM<br />
: LPA Dist. RAM<br />
: RA<br />
: RA (DSP48E-Systolic)<br />
Case 1 Case 2 Case 3 Case 4 Case 5<br />
(a)<br />
: LPA Block RAM<br />
: LPA Dist. RAM<br />
: RA<br />
: RA (DSP48E-Systolic)<br />
The MAC architecture for LPA and RA has DSP48E<br />
slice-based multipliers and CLB-based adder trees. All the<br />
cases based on this MAC architecture uses 12 DSP48E<br />
slices. However, the RA with DSP48E systolic-array-based<br />
MAC uses two extra DSP48E slices for the final accumulation<br />
process. Similarly, the filter coefficient banks (which are<br />
based on block RAMs) in both LPA distributed RAM and RAs<br />
uses three dedicated RAMB18x2s [11] resources. However,<br />
the LPA block RAM uses three extra resources of RAMB18x2s<br />
for its block RAM based register bank. Furthermore, the LPAs<br />
use one RAM18x2SDPs [11] for the input FIFO (which is zero<br />
for the RAs because they do not use input FIFO).<br />
Fig. 12 shows the number of LUTs versus (operating clock)<br />
frequency for the four design solutions for the five embedded<br />
resampling cases for the polyphase filter. The RAs for case 1,<br />
case 3, and case 5 require less area and lower processing<br />
clocks than their LPA counterparts. Similarly, the RAs for case<br />
2 and case 4 require more area but use lower processing<br />
clocks than their LPA counterparts. This larger area is due to<br />
the pre-stored indices for addressing data register and filter<br />
coefficient banks, and also due to the use of multiplexers.<br />
Thus, RA is the preferred choice for reduced operating clock<br />
rates, and also for reduced area resources with the exception<br />
of cases 2 and 4.<br />
5 Power Analysis<br />
Here we analyze power consumption, focusing on dynamic<br />
power (CMOS technology is assumed) for the LPA and RA of<br />
Hardware Architecture of Polyphase Filter Banks Performing Embedded Resampling for Software-Defined Radio Front-Ends<br />
Mehmood Awan, Yannick Le Moullec, Peter Koch, and Fred Harris<br />
DSP48Es RAMB18×2s RAMB18×2SDPs<br />
0 30 60 90 120 150<br />
Frequency (MHz)<br />
(c) (d)<br />
DSP48E: dedicated computational resource on Virtex-5 FPGA<br />
LPA: load-process architecture<br />
RA: runtime architecture<br />
▲Figure 11. Resource usage for the five embedded resampling cases under LPA and RA: (a)<br />
slice registers, (b) slice LUTs, (c) dedicated resources, and (d) required operating clock rates.<br />
Slice LUTS<br />
600<br />
500<br />
400<br />
300<br />
200<br />
100<br />
0<br />
Case 5<br />
Case 4<br />
Case 3<br />
Case 2<br />
Case 1<br />
: LPA Block RAM<br />
: LPA Dist. RAM<br />
: RA<br />
: RA (DSP48E-Systolic)<br />
Case 1 Case 2 Case 3<br />
(b)<br />
Case 4 Case 5<br />
: LPA Block RAM<br />
: LPA Dist. RAM<br />
: RA<br />
: RA (DSP48E-Systolic)<br />
180<br />
210<br />
the polyphase filter with five different<br />
resampling factors. The demand for high<br />
clock rates in the LPA is often difficult to<br />
satisfy, and the architectures are power<br />
hungry because the dynamic power is<br />
proportional to the toggle frequency [12]:<br />
where n is the number of nodes being<br />
toggled, C is the load capacitance per node,<br />
V is the supply voltage, and f is the toggle<br />
frequency. C and V are device parameters,<br />
whereas n and f are design parameters. By<br />
keeping almost the same n and lowering f,<br />
power consumption decreases. As reduced<br />
clock operation has the same time constraint<br />
as high clock operation, energy consumption<br />
is reduced as well.<br />
Fig. 13 shows dynamic power<br />
consumption for LPA and RA in the five<br />
embedded resampling cases. Xilinx XPA [13]<br />
tool is used. The input vector is the same for<br />
all the cases and designs, and the activity<br />
rates are calculated using ModelSim [14]<br />
post-route simulations with a run length of<br />
500 μs. The thermal settings for the power<br />
simulations are 25 degrees Celsius—an<br />
ambient temperature, and zero airflow.<br />
Fig. 13 shows that the LPA with block RAM-based register<br />
bank consumes more dynamic power than LPA with<br />
distributed-RAM based register bank and also more dynamic<br />
power than RAs. The maximally decimated system (case 1)<br />
based on RAs consumes 64% less dynamic power than its<br />
LPA counterpart when block RAM-based register banks are<br />
used, and it consumes 49% less dynamic power than its LPA<br />
counterpart when distributed RAM-based register banks are<br />
used. The under-decimated system (case 2) based on RAs<br />
consumes 48% less dynamic power than its LPA counterpart<br />
when block RAM-based register banks are used, and it<br />
Area (LUTs)<br />
700<br />
600<br />
500<br />
400<br />
300<br />
200<br />
100<br />
0<br />
0<br />
▲❋<br />
❋<br />
◆▲<br />
Runtime<br />
Architectures<br />
30<br />
■<br />
■<br />
60<br />
90 120<br />
Frequency (MHz)<br />
LUT: lookup table<br />
Load-Process<br />
Architectures<br />
▲Figure 12. Area versus (operating clock) frequency for the four<br />
design solutions for the five embedded resampling cases.<br />
■<br />
◆<br />
150<br />
◆<br />
■<br />
▲<br />
×<br />
❋<br />
×<br />
◆<br />
×<br />
❋▲<br />
❋<br />
▲<br />
R esearch Papers<br />
P dynamic = nCV 2 f, (1)<br />
■<br />
180<br />
: Case 1<br />
: Case 2<br />
: Case 3<br />
: Case 4<br />
: Case 5<br />
×<br />
210<br />
March 2012 Vol.10 No.1 <strong>ZTE</strong> COMMUNICATIONS 61