21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Architecture Enhancements for the ADRES Coarse-Gra<strong>in</strong>ed Reconfigurable Array 77<br />

Table 5. Compar<strong>in</strong>g base (100MHz) with f<strong>in</strong>al <strong>in</strong>stance (312MHz)<br />

Total MIPS MIPS/mW mW/MHz<br />

Power Energy<br />

(mW) (uJ)<br />

FFT<br />

Base 73.28 0.619 759 10.35 0.7328<br />

F<strong>in</strong>al 67.29 0.307 1190 17.68 0.2153<br />

Improve<br />

IDCT<br />

8.17% 50.4% 56.78% 70.82% 70.62%<br />

Base 80.45 37.72 1409 17.51 0.8045<br />

F<strong>in</strong>al 81.99 19.14 2318 28.27 0.2624<br />

Improve -1.91% 49.25% 64.51% 61.45% 67.38%<br />

5.1 Putt<strong>in</strong>g It All Together<br />

Comb<strong>in</strong><strong>in</strong>g the arch 8 architecture with the aforementioned optimizations results <strong>in</strong><br />

a low power, high performance ADRES <strong>in</strong>stance: 4x4 arch 8 4L f<strong>in</strong>al. A comparison<br />

between the proposed architecture with the base architecture (shown <strong>in</strong> Figure 1) is<br />

provided <strong>in</strong> Table 5. The results <strong>in</strong>dicate a moderate improvement <strong>in</strong> power of 8%, but<br />

with a higher performance of 56 - 65% due to the pipel<strong>in</strong><strong>in</strong>g and rout<strong>in</strong>gs features. This<br />

results <strong>in</strong> lower energy dissipation of the architecture by 50%. The area of the proposed<br />

architecture was improved from 1.59mm 2 (544k gates) to 1.08mm 2 (370k gates), which<br />

is equivalent to a 32% improvement.<br />

5.2 F<strong>in</strong>al Architecture Power Decomposition<br />

The f<strong>in</strong>al 4x4 arch 8 4L f<strong>in</strong>al architecture is placed and routed us<strong>in</strong>g Cadence SOC<br />

Encounter v4.2. The power and area of the proposed architecture layout are decomposed<br />

<strong>in</strong> Figures 12(a) and 12(b), respectively. These figures are of the ADRES core<br />

architecture exclud<strong>in</strong>g data and <strong>in</strong>struction memories. Due to the fact that the f<strong>in</strong>al architecture<br />

is pipel<strong>in</strong>ed the clock tree contribution (4.67mW) is <strong>in</strong>cluded <strong>in</strong> these figures.<br />

The data memory and the <strong>in</strong>struction cache were not <strong>in</strong>cluded <strong>in</strong> the synthesis for which<br />

no power estimations are made. The multiplexors <strong>in</strong> the CGA datapath were removed<br />

dur<strong>in</strong>g synthesis by the synthesis tool as this was beneficial for performance.<br />

Compar<strong>in</strong>g Figure 3 with Figure 12(a) we notice that the shared local DRFs comb<strong>in</strong>ed<br />

with clock gat<strong>in</strong>g results <strong>in</strong> lower power consumption. The configuration memories<br />

still require a vast amount of power and area, but have decreased <strong>in</strong> size as well.<br />

Further optimizations of the configuration memories require advance power management<br />

e.g. power gat<strong>in</strong>g, which was not applied <strong>in</strong> the f<strong>in</strong>al architecture. Interest<strong>in</strong>g to<br />

note is the relatively higher power consumption of the CGA FUs compared to Figure 3.<br />

This is caused by the higher utilization of the array compared to the base architecture<br />

consum<strong>in</strong>g more power, but provid<strong>in</strong>g higher performance. This <strong>in</strong>creases power efficiency<br />

as noticeable <strong>in</strong> Table 3. The 16 CGA FUs and the CMs require 68.66% of all<br />

the area as depicted <strong>in</strong> Figure 12(b). The largest s<strong>in</strong>gle component is the global DRF<br />

(noted as drf vliw) with 8 read and 4 write ports.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!