29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

394 Chapter 29<br />

loops to take advantage of a cache hierarchy. The compiler optimization<br />

methodology used in this work is as follows:<br />

Using affine loop and data trans<strong>for</strong>mations, we first optimize temporal and<br />

spatial locality aggressively.<br />

We then optimize register usage through unroll-and-jam and scalar replacement.<br />

For the first step, we use an exten<strong>de</strong>d <strong>for</strong>m of the approach presented in<br />

[5]. We have chosen this method <strong>for</strong> two reasons. First, it uses both loop and<br />

data trans<strong>for</strong>mations, and is more powerful than pure loop ([13]) and pure<br />

data ([12]) trans<strong>for</strong>mation techniques. Secondly, this trans<strong>for</strong>mation framework<br />

was readily available to us. It should be noted, however, that other<br />

locality optimization approaches such as [12] would result in similar output<br />

co<strong>de</strong>s <strong>for</strong> the regular, array-based programs in our experimental suite. The<br />

second step is fairly standard and its <strong>de</strong>tails can be found in [4]. A brief<br />

summary of this compiler optimization strategy follows. Consi<strong>de</strong>r the following<br />

loop nest:<br />

<strong>for</strong>(i=1;i

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!