29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Rapid Configuration & Instruction Selection <strong>for</strong> an ASIP 415<br />

Table 30-3, and Table 30-4. At the end, we compare results from simulation<br />

to our methodology.<br />

The simulation results of 576 configurations are plotted in Figure 30-6 with<br />

execution time verse area. The dark squares are the points obtained using our<br />

methodology <strong>for</strong> differing area constraints. Those points selected by our<br />

methodology correspon<strong>de</strong>d to the Pareto points of the <strong>de</strong>sign space graph.<br />

As mentioned above, the 576 configurations represent the entire <strong>de</strong>sign space<br />

and these Pareto points indicate the fastest execution time un<strong>de</strong>r a particular<br />

area constraint. There<strong>for</strong>e, there are no extra Pareto points in this <strong>de</strong>sign space.<br />

Moreover, Table 30-5 shows the in<strong>for</strong>mation of all Pareto points such as area<br />

in gates, simulated execution time in seconds, estimated execution time in<br />

seconds, latency, selected Xtensa processor and the replaced software function<br />

calls. Table 30-5 also indicates the estimation of application per<strong>for</strong>mance <strong>for</strong><br />

each configuration is on average within 4% of the simulation result. For the<br />

fastest configuration (configuration 9 in Table 30-5), the application execution<br />

time is reduced to 85% of the original execution time (configuration 1<br />

in Table 30-5). The fastest configuration is with Xtensa processor P3 (that is<br />

with a floating-point unit and its associated configurable core options). Seven<br />

TIE instructions (LDEXP, FREXP, MOD3, FREXPLN, MANT, LP24, COMB)<br />

are also implemented in the fastest configuration to replace four software<br />

functions. There are square root, modular 3, natural logarithm and floatingpoint<br />

division. The time <strong>for</strong> selection of core options and instructions are in<br />

the or<strong>de</strong>r of a few hours (about 3–4 hrs), while the exhaustive simulation<br />

method would take several weeks (about 300 hrs) to complete.<br />

Table 30-5. Pareto points.<br />

Configuration<br />

Area<br />

(gate)<br />

Simulated<br />

execution<br />

time (sec.)<br />

Estimated<br />

execution<br />

time (sec.)<br />

% diff.<br />

Latency<br />

of the<br />

proc.<br />

Xtensa<br />

Proc.<br />

selected<br />

<strong>Software</strong> function<br />

replaced<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

35,000<br />

67,000<br />

72,500<br />

87,000<br />

88,100<br />

93,600<br />

100,400<br />

103,700<br />

105,900<br />

1.8018<br />

1.0164<br />

1.0147<br />

0.2975<br />

0.2738<br />

0.2670<br />

0.2632<br />

0.2614<br />

0.2586<br />

_<br />

1.1233<br />

1.1188<br />

–<br />

0.2718<br />

0.2694<br />

0.2651<br />

0.2642<br />

0.2638<br />

_<br />

10.5<br />

10.2<br />

–<br />

0.7<br />

2.1<br />

1.9<br />

1.0<br />

2.0<br />

5.32<br />

7.1<br />

7.1<br />

6.45<br />

6.5<br />

6.5<br />

6.8<br />

6.9<br />

7.0<br />

P1<br />

P1<br />

P1<br />

P3<br />

P3<br />

P3<br />

P3<br />

P3<br />

P3<br />

_<br />

FP mult<br />

Mod 3, FP mult<br />

–<br />

Sqrt (2)<br />

Sqrt (1), Log<br />

Sqrt (1), Log,<br />

FP div (2)<br />

Sqrt (2), Mod 3,<br />

Log, FP div (2)<br />

Sqrt (1), Mod 3,<br />

Log, FP div (2)<br />

Avg<br />

–<br />

–<br />

–<br />

4%<br />

–<br />

–<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!