29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

218 Chapter 17<br />

the parallelization of a loop due to fewer data <strong>de</strong>pen<strong>de</strong>ncies [1] and to possibly<br />

improve I-cache per<strong>for</strong>mance due to smaller loop bodies. In [10] it is shown<br />

that loop splitting leads to increased energy consumption of the processor<br />

and the memory system. Since the computational complexity of a loop is not<br />

reduced, this technique does not solve the problems that are due to the<br />

properties discussed previously.<br />

Loop unswitching is applied to loops containing loop-invariant ifstatements<br />

[19]. The loop is then replicated insi<strong>de</strong> each branch of the ifstatement,<br />

reducing the branching overhead and <strong>de</strong>creasing co<strong>de</strong> sizes of the<br />

loops [1]. The goals of loop unswitching and the way how the optimization<br />

is expressed are equivalent to the topics of the previous section. But since<br />

the if-statements must not <strong>de</strong>pend on in<strong>de</strong>x variables, loop unswitching can<br />

not be applied to multimedia programs. It is the contribution of the techniques<br />

presented in this article that we explicitly focus on loop-variant conditions.<br />

Since our analysis techniques go far beyond those required <strong>for</strong> loop splitting<br />

or unswitching and have to <strong>de</strong>al with entire loop nests and sets of in<strong>de</strong>x<br />

variables, we call our optimization technique loop nest splitting.<br />

In [11], an evaluation of the effects of four loop trans<strong>for</strong>mations (loop<br />

unrolling, interchange, fusion and tiling) on memory energy is given. The<br />

authors have observed that these techniques reduce the energy consumed by<br />

data accesses, but that the energy required <strong>for</strong> instruction fetches is increased<br />

significantly. They draw the conclusion that techniques <strong>for</strong> the simultaneous<br />

optimization of data and instruction locality are nee<strong>de</strong>d. This article <strong>de</strong>monstrates<br />

that loop nest splitting is able to achieve these aims.<br />

In [15], classical loop splitting is applied in conjunction with function call<br />

insertion at the source co<strong>de</strong> level to improve the I-cache per<strong>for</strong>mance. After<br />

the application of loop splitting, a large reduction of I-cache misses is reported<br />

<strong>for</strong> one benchmark. All other parameters (instruction and data memory<br />

accesses, D-cache misses) are worse after the trans<strong>for</strong>mation. All results are<br />

generated with cache simulation software which is known to be unprecise,<br />

and the runtimes of the benchmark are not consi<strong>de</strong>red at all.<br />

Source co<strong>de</strong> trans<strong>for</strong>mations are studied in literature <strong>for</strong> many years. In [9],<br />

array and loop trans<strong>for</strong>mations <strong>for</strong> data transfer optimization are presented<br />

using a medical image processing algorithm [3]. The authors only focus on<br />

the illustration of the optimized data flow and thus neglect that the control<br />

flow gets very irregular since many additional if-statements are inserted. This<br />

impaired control flow has not yet been targeted by the authors. As we will<br />

show in this article, loop nest splitting applied as postprocessing stage is able<br />

to remove the control flow overhead created by [9] with simultaneous further<br />

data transfer optimization. Other control flow trans<strong>for</strong>mations <strong>for</strong> acceleration<br />

of address-dominated applications are presented in [7].<br />

Genetic algorithms (GA) have proven to solve complex optimization<br />

problems by imitating the natural optimization process (see e.g. [2] <strong>for</strong> an<br />

overview). Since they are able to revise unfavorable <strong>de</strong>cisions ma<strong>de</strong> in a<br />

previous optimization phase, GA’s are a<strong>de</strong>quate <strong>for</strong> solving non-linear

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!