29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

246 Chapter 19<br />

kernels, i.e. a small fraction of the entire co<strong>de</strong> (sometimes as small as 3%) is<br />

executed most of the time, see e.g. [16] <strong>for</strong> an analysis of the MediaBench<br />

[5] suite of programs. Moreover, most of the processing time is typically spent<br />

executing the 2 innermost loops of such time critical loop nests – in [16], <strong>for</strong><br />

example, this percentage was found to be about 95% <strong>for</strong> MediaBench<br />

programs. Yet another critical observation ma<strong>de</strong> in [16] is that there exists<br />

consi<strong>de</strong>rably high control complexity within these loops. This strongly<br />

suggests that in or<strong>de</strong>r to be effective, ILP extraction targeting such time critical<br />

inner loop bodies must handle control/branching constructs.<br />

The key contribution of this paper is a novel resource-aware algorithm <strong>for</strong><br />

compiler-directed ILP extraction targeting clustered EPIC machines that<br />

integrates three powerful ILP extraction techniques: predication, control speculation<br />

and software pipelining/modulo scheduling. An important innovation<br />

in our approach is the ability to per<strong>for</strong>m resource constrained speculation in<br />

the context of a complex (phased) optimization process. Specifically, in or<strong>de</strong>r<br />

to enable maximum utilization of the resources of the clustered processor, and<br />

thus maximize per<strong>for</strong>mance, our algorithm judiciously speculates operations<br />

on the predicated modulo scheduled loop body using a set of effective load<br />

based metrics. In addition to extracting ILP from time-critical loops, our<br />

framework schedules and binds the resulting operations, generating actual<br />

VLIW co<strong>de</strong>.<br />

1.1. Background<br />

The per<strong>for</strong>mance of a loop is <strong>de</strong>fined by the average rate at which new loop<br />

iterations can be started, <strong>de</strong>noted initiation interval (II). <strong>Software</strong> pipelining<br />

is an ILP extraction technique that retimes [20] loop body operations (i.e.,<br />

overlaps multiple loop iterations), so as to enable the generation of more<br />

compact schedules [3, 18]. Modulo scheduling algorithms exploit such<br />

technique during scheduling, so as to expose additional ILP to the datapath<br />

resources, and thus <strong>de</strong>crease the initiation interval of a loop [23].<br />

Predication allows one to concurrently schedule alternative paths of<br />

execution, with only the paths corresponding to the realized flow of control<br />

being allowed to actually modify the state of the processor. The key i<strong>de</strong>a in<br />

predication is to eliminate branches through a process called if-conversion [8].<br />

If-conversion, trans<strong>for</strong>ms conditional branches into (1) operations that <strong>de</strong>fine<br />

predicates, and (2) operations guar<strong>de</strong>d by predicates, corresponding to<br />

alternative control paths. 1 A guar<strong>de</strong>d operation is committed only if its<br />

predicate is true. In this sense, if-conversion is said to convert control <strong>de</strong>pen<strong>de</strong>nces<br />

into data <strong>de</strong>pen<strong>de</strong>nces (i.e., <strong>de</strong>pen<strong>de</strong>nce on predicate values), generating<br />

what is called a hyperblock [11].<br />

Control speculation “breaks” the control <strong>de</strong>pen<strong>de</strong>nce between an operation<br />

and the conditional statement it is <strong>de</strong>pen<strong>de</strong>nt on [6]. By eliminating such<br />

<strong>de</strong>pen<strong>de</strong>nces, operations can be moved out of conditional branches, and can<br />

be executed be<strong>for</strong>e their related conditionals are actually evaluated. Compilers

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!