29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

328 Chapter 24<br />

banks increases. E.g. in Figure 24-2. the optimal number of banks increases<br />

from five to six when the processor frequency changes from 100 MHz to 600<br />

MHz.<br />

SA per<strong>for</strong>ms poorly when the energy consumption is dominated by the<br />

dynamic energy. It cannot exploit idle banks owned by other processors to<br />

reduce the number of page-misses. The difference between SA and MA (an<br />

approximation of the best-possible assignment) is large (more than 300),<br />

indicating that sharing SDRAM memories is an interesting option <strong>for</strong> heterogeneous<br />

multi-processor plat<strong>for</strong>ms. It increases the exploration space such<br />

that better assignments can be found. When the banks are not too heavily used,<br />

even no per<strong>for</strong>mance penalty is present (see below).<br />

We also observe in Figure 24-2. that existing commercial multi-processor<br />

memory allocators (RA) per<strong>for</strong>m badly compared to MA. This suggests that<br />

large headroom <strong>for</strong> improvement exists. When only one bank is available,<br />

obviously all memory allocation algorithms produce the same results. With<br />

an increasing number of banks the gap between RA and MA first wi<strong>de</strong>ns as<br />

a result of the larger assignment freedom (up to 55% <strong>for</strong> Rgb2Yuv and Cmp<br />

with four banks). However, the per<strong>for</strong>mance of the RA improves with an<br />

increasing number of banks: the chances increase that RA distributes the data<br />

structures across the banks, which significantly reduces the energy consumption.<br />

There<strong>for</strong>e, when the number of banks becomes large the gap<br />

between RA and MA becomes smaller again (50% <strong>for</strong> Rgb2Yuv and Cmp with<br />

six banks).<br />

For higher processor frequencies the static energy consumption <strong>de</strong>creases<br />

and the potential gains become larger. E.g. <strong>for</strong> Convolve and Cmp the gap<br />

increases from 26% to 34%.<br />

Figure 24-2 shows how BE outper<strong>for</strong>ms RA. Results in Table 24-6. suggest<br />

an improvement up to 50% (see Rgb2Yuv and Cmp with four banks).<br />

Moreover, BE often comes close to the MA results. The difference between<br />

BE and MA is always less than 23%. When the number of tasks in the<br />

application becomes large (see task-set in Table 24-6. which consists of 10<br />

tasks), we note a small energy loss of BE compared to RA <strong>for</strong> the first taskset<br />

and eight banks are used. In this case 50 data structures are allocated in<br />

Table 24-6. Energy comparison of several allocation strategies.<br />

Tasks(l00Mhz)<br />

Nds<br />

Nbank(uJ)<br />

8<br />

16<br />

32<br />

2Quick+2R2Y+2Rin+Cmp+2Lzw(SA)<br />

2Quick+2R2Y+2Rin+Cmp+2Lzw(RA)<br />

2Quick+2R2Y+2Rin+Cmp+2Lzw(MA)<br />

2Quick+2R2Y+2Rin+Cmp+2Lzw(GP)<br />

2Quick+2R2Y+2Rin+Cmp+2Lzw(BE)<br />

2Quick+2R2Y+2Rin+Cmp+2Lzw(Estatic)<br />

50<br />

50<br />

50<br />

50<br />

50<br />

50<br />

_<br />

49525<br />

47437<br />

–<br />

51215<br />

10227<br />

48788<br />

41872<br />

37963<br />

38856<br />

40783<br />

9045<br />

48788<br />

45158<br />

41610<br />

34236<br />

35371<br />

8489

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!