29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 17<br />

CONTROL FLOW DRIVEN SPLITTING OF<br />

LOOP NESTS AT THE SOURCE CODE LEVEL<br />

Heiko Falk 1 , Peter Marwe<strong>de</strong>l 1 , and Francky Catthoor 2<br />

1 University of Dortmund, Computer Science 12, Germany; 2 IMEC, Leuven, Belgium<br />

Abstract. This article presents a novel source co<strong>de</strong> trans<strong>for</strong>mation <strong>for</strong> control flow optimization<br />

called loop nest splitting which minimizes the number of executed if-statements in loop<br />

nests of embed<strong>de</strong>d multimedia applications. The goal of the optimization is to reduce runtimes<br />

and energy consumption. The analysis techniques are based on precise mathematical mo<strong>de</strong>ls<br />

combined with genetic algorithms. The application of our implemented algorithms to three reallife<br />

multimedia benchmarks using 10 different processors leads to average speed-ups by<br />

23.6%–62.1% and energy savings by 19.6%–57.7%. Furthermore, our optimization also leads<br />

to advantageous pipeline and cache per<strong>for</strong>mance.<br />

Key words: cache, co<strong>de</strong> size, energy, genetic algorithm, loop splitting, loop unswitching,<br />

multimedia, pipeline, polytope, runtime, source co<strong>de</strong> trans<strong>for</strong>mation<br />

1. INTRODUCTION<br />

In recent years, the power efficiency of embed<strong>de</strong>d multimedia applications<br />

(e.g. medical image processing, vi<strong>de</strong>o compression) with simultaneous consi<strong>de</strong>ration<br />

of timing constraints has become a crucial issue. Many of these<br />

applications are data-dominated using large amounts of data memory.<br />

Typically, such applications consist of <strong>de</strong>eply nested <strong>for</strong>-loops. Using the<br />

loops’ in<strong>de</strong>x variables, addresses are calculated <strong>for</strong> data manipulation. The<br />

main algorithm is usually located in the innermost loop. Often, such an algorithm<br />

treats particular parts of its data specifically, e.g. an image bor<strong>de</strong>r<br />

requires other manipulations than its center. This boundary checking is implemented<br />

using if-statements in the innermost loop (see e.g. Figure 17-1).<br />

Although common subexpression elimination and loop-invariant co<strong>de</strong><br />

motion [19] are already applied, this co<strong>de</strong> is sub-optimal w.r.t. runtime and<br />

energy consumption <strong>for</strong> several reasons. First, the if-statements lead to a very<br />

irregular control flow. Any jump instruction in a machine program causes a<br />

control hazard <strong>for</strong> pipelined processors [19]. This means that the pipeline<br />

needs to be stalled <strong>for</strong> some instruction cycles, so as to prevent the execution<br />

of incorrectly prefetched instructions.<br />

Second, the pipeline is also influenced by data references, since it can also<br />

be stalled during data memory accesses. In loop nests, the in<strong>de</strong>x variables<br />

are accessed very frequently resulting in pipeline stalls if they can not be<br />

215<br />

A Jerraya et al. (eds.), <strong>Embed<strong>de</strong>d</strong> <strong>Software</strong> <strong>for</strong> SOC, 215–229, 2003.<br />

© 2003 Kluwer Aca<strong>de</strong>mic Publishers. Printed in the Netherlands.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!