29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Dynamic Parallelization of Array 317<br />

whereas the remaining nests do not. Since the third nest of this application is<br />

the one with the largest execution time‚ it dominates the overall energy-<strong>de</strong>lay<br />

product‚ leading to some gain when the entire application is consi<strong>de</strong>red. Hyper‚<br />

on the other hand‚ benefits from the history based training. The last two bars<br />

Figure 23-5 show‚ <strong>for</strong> each nest‚ the normalized energy-<strong>de</strong>lay product due to<br />

our strategy without and with past history in<strong>for</strong>mation. Recall from Section<br />

1.3.0 that when the history in<strong>for</strong>mation is used‚ we can eliminate the training<br />

overheads in successive visits. The results shown in the figure clearly indicate<br />

that all nests in this application take advantage of history in<strong>for</strong>mation‚ resulting<br />

in a 9% improvement in the energy-<strong>de</strong>lay product as compared to our strategy<br />

without history in<strong>for</strong>mation.<br />

5. CONCLUDING REMARKS<br />

On-chip multiprocessing is an attempt to speedup applications by exploiting<br />

inherent parallelism in them. In this study‚ we have ma<strong>de</strong> three major contributions.<br />

First‚ we have presented a runtime loop parallelization strategy<br />

<strong>for</strong> on-chip multiprocessors. This strategy uses the initial iterations of a given<br />

loop to <strong>de</strong>termine the best number of processors to employ in executing the<br />

remaining iterations. Second‚ we have quantified benefits of our strategy using<br />

a simulation environment and showed how its behavior can be improved by<br />

being more aggressive in <strong>de</strong>termining the best number of processors. Third‚<br />

we have <strong>de</strong>monstrated that the overheads associated with our approach can<br />

be reduced using past history in<strong>for</strong>mation about loop executions.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!