21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

72 F. Bouwens et al.<br />

Table 1. Distributed DRFs Names<br />

Orig<strong>in</strong>al Renamed Orig<strong>in</strong>al Renamed<br />

4x4 mesh plus arch 1 4x4 mesh plus pred bus arch 1 pb<br />

4x4 reg con shared 2R 1W arch 2 4x4 reg con shared 2R 1W pred bus arch 2 pb<br />

4x4 reg con shared 4R 2W arch 3 4x4 reg con shared 4R 2W pred bus arch 3 pb<br />

4x4 reg con shared 8R 4W arch 4 4x4 reg con shared 8R 4W pred bus arch 4 pb<br />

4x4 reg con all arch 5 4x4 reg con all pred bus arch 5 pb<br />

Leakage (mW)<br />

2.5<br />

2.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

np_1x1_reg<br />

arch_1<br />

arch_1_pb<br />

arch_2<br />

arch_2_pb<br />

arch_3<br />

arch_3_pb<br />

arch_4<br />

arch_4_pb<br />

arch_5<br />

Fig. 5. Leakage Power<br />

arch_5_pb<br />

arch_6<br />

arch_6_pb<br />

arch_7<br />

arch_7_pb<br />

arch_8<br />

Intercon.<br />

Misc.<br />

Intercon.<br />

REGs.<br />

Intercon.<br />

MUX<br />

FU_VLIW<br />

PRF_VLIW<br />

DRF_VLIW<br />

FU_CGA<br />

PRF_CGA<br />

DRF_CGA<br />

VLIW_CU<br />

depicted <strong>in</strong> Figure 1. The created architectures are noted <strong>in</strong> Table 1 and the names are<br />

abbreviated for ease of explanation.<br />

All results presented here and <strong>in</strong> Section 4.2 are placed together <strong>in</strong> Figures 5 and 7<br />

- 10. The leakage results are depicted <strong>in</strong> Figure 5 and the power results at 100MHz of<br />

IDCT and FFT are presented <strong>in</strong> Figures 7 and 8, respectively. The energy-delay results<br />

are depicted <strong>in</strong> Figures 9 and 10. We will discuss the results presented here first.<br />

Remov<strong>in</strong>g the local DRFs <strong>in</strong> the first two architectures (arch 1 and arch 1 pb)result<br />

<strong>in</strong> a low energy consumption, but decreased performance as depicted <strong>in</strong> Figures 9 and<br />

10. This <strong>in</strong>dicates the DRFs are beneficial dur<strong>in</strong>g CGA operation. Replac<strong>in</strong>g the PRFs<br />

with a bus <strong>in</strong>creases energy consumption by (FFT: 10 - 46%, IDCT: 5 - 37%) except<br />

for arch 5. This experiment shows that stor<strong>in</strong>g predicate results of previous operations<br />

<strong>in</strong> local PRFs is essential for overall power consumption <strong>in</strong> cases with few local DRFs<br />

or no DRF at all.<br />

The experiment <strong>in</strong> this section showed that register file shar<strong>in</strong>g creates additional<br />

rout<strong>in</strong>g for the DRESC compiler that improves schedul<strong>in</strong>g density of the array. Interconnection<br />

complexity is reduced requir<strong>in</strong>g less configuration bits, hence decreas<strong>in</strong>g the<br />

size of the configuration memories. Replac<strong>in</strong>g four RFs with only one reduced leakage<br />

and IDCT/FFT power consumption of the local DRFs. Local RFs with 1 write and<br />

2 read ports <strong>in</strong> arch 2 outperforms <strong>in</strong> terms of power and energy-delay the fully distributed<br />

RFs architecture arch 5 for both, FFT and IDCT, kernels.<br />

CM

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!