29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Control Flow Driven Splitting of Loop Nests 227<br />

These increases are not serious, since the ad<strong>de</strong>d energy required <strong>for</strong> storing<br />

these instructions is compensated by the savings achieved by loop nest splitting.<br />

Figure 17-6b shows the evolution of memory accesses and energy consumption<br />

using an instruction-level energy mo<strong>de</strong>l [21] <strong>for</strong> the ARM7 core<br />

consi<strong>de</strong>ring bit-toggles and offchip-memories and having an accuracy of 1.7%.<br />

Column Instr Read shows reductions of instruction memory reads of<br />

23.5% (CAV) – 56.9% (ME). Moreover, our optimization reduces data<br />

memory accesses significantly. Data reads are reduced up to 65.3% (ME). For<br />

QSDPCM, the removal of spill co<strong>de</strong> reduces data writes by 95.4%. In contrast,<br />

the insertion of spill co<strong>de</strong> <strong>for</strong> CAV leads to an increase of 24.5%. The sum<br />

of all memory accesses (Mem accesses) drops by 20.8% (CAV) – 57.2%<br />

(ME).<br />

Our optimization leads to large energy savings in both the CPU and its<br />

memory. The energy consumed by the ARM core is reduced by 18.4% (CAV)<br />

– 57.4% (ME), the memory consumes between 19.6% and 57.7% less energy.<br />

Total energy savings by 19.6% – 57.7% are measured. These results<br />

<strong>de</strong>monstrate that loop nest splitting optimizes the locality of instructions and<br />

data accesses simultaneously as <strong>de</strong>sired by Kim, Irwin et al. [11]<br />

Anyhow, if co<strong>de</strong> size increases (up to a rough theoretical bound of 100%)<br />

are critical, our algorithms can be changed so that the splitting-if is placed in<br />

some inner loop. This way, co<strong>de</strong> duplication is reduced at the expense of lower<br />

speed-ups, so that tra<strong>de</strong>-offs between co<strong>de</strong> size and speed-up can be realized.<br />

5. CONCLUSIONS<br />

We present a novel source co<strong>de</strong> optimization called loop nest splitting which<br />

removes redundancies in the control flow of embed<strong>de</strong>d multimedia applications.<br />

Using polytope mo<strong>de</strong>ls, co<strong>de</strong> without effect on the control flow is<br />

removed. Genetic algorithms i<strong>de</strong>ntify ranges of the iteration space where all<br />

if-statements are provably satisfied. The source co<strong>de</strong> of an application is<br />

rewritten in such a way that the total number of executed if-statements is<br />

minimized.<br />

A careful study of 3 benchmarks shows significant improvements of<br />

branching and pipeline behavior. Also, caches benefit from our optimization<br />

since I- and D-cache misses are reduced heavily (up to 68.5%). Since accesses<br />

to instruction and data memory are reduced largely, loop nest splitting thus<br />

leads to large power savings (19.6%–57.7%). An exten<strong>de</strong>d benchmarking<br />

using 10 different CPUs shows mean speed-ups of the benchmarks by<br />

23.6%–62.1%.<br />

The selection of benchmarks used in this article illustrates that our optimization<br />

is a general and powerful technique. Not only typical real-life<br />

embed<strong>de</strong>d co<strong>de</strong> is improved, but also negative effects of other source co<strong>de</strong><br />

trans<strong>for</strong>mations introducing control flow overheads into an application are<br />

eliminated. In the future, our analytical mo<strong>de</strong>ls are exten<strong>de</strong>d <strong>for</strong> treating more

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!