29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

316 Chapter 23<br />

We also measured the overhead breakdown <strong>for</strong> our approach. The overheads<br />

are divi<strong>de</strong>d into three portions: the overheads due to sampling the<br />

counters‚ the overheads due to the calculations <strong>for</strong> computing the objective<br />

function‚ checking constraints‚ and selecting the best number of processors<br />

(<strong>de</strong>noted Calculations in the figure)‚ and the overheads due to processor reactivation<br />

(i.e.‚ transitioning a processor from the inactive state to the active<br />

state). We observed that the contributions of these overheads are 4.11%‚<br />

81.66%‚ and 14.23%‚ respectively. In other words‚ most of the overhead is<br />

due to the (objective function + constraint evaluation) calculations per<strong>for</strong>med<br />

at runtime. The last component of our overheads (i.e.‚ locality-related one) is<br />

difficult to isolate and is incorporated as part of the execution time/energy.<br />

The results reported so far have been obtained by setting the number of<br />

iterations used in training to 10% of the total number of iterations in the nest.<br />

To study the impact of the size of the training set‚ we per<strong>for</strong>med a set of<br />

experiments where we modified the number of iterations used in the training<br />

period. The results shown in Figure 23-4 indicate that <strong>for</strong> each benchmark<br />

there exists a value (<strong>for</strong> the training iterations) that generates the best result.<br />

Working with very small size can magnify the small variations between iterations<br />

and might prevent us from <strong>de</strong>tecting the best number of processors<br />

accurately. At the other extreme‚ using a very large size makes sure that we<br />

find the best number of processors; however‚ we also waste too much<br />

time/energy during the training phase itself.<br />

To investigate the influence of conservative training (Section 1.3.0) and<br />

history based training (Section 1.3.0)‚ we focus on two applications: Atr and<br />

Hyper (both of these applications have nine nests). In Atr‚ in some nests‚ there<br />

are some significant variations (load imbalances) between loop iterations.<br />

There<strong>for</strong>e‚ it is a suitable candidate <strong>for</strong> conservative training. The first two<br />

bars in Figure 23-5 show‚ <strong>for</strong> each nest‚ the normalized energy-<strong>de</strong>lay product<br />

due to our strategy without and with conservative training‚ respectively. We<br />

see that the 3rd‚ 4th‚ 5th‚ and 9th nests take advantage of conservative training‚

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!