29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Compiler-Directed ILP Extraction 249<br />

and equals 2 <strong>for</strong> operation a. Assuming that<br />

the probability of scheduling an operation at any time step in its time frame<br />

is given by a uni<strong>for</strong>m distribution [22], the contribution to the load of an<br />

operation op at time step t in its time frame is given by In Figure<br />

19-1, the contributions to the load of all the addition operations (labeled a,<br />

c, d, f, g and h) are indicated by the sha<strong>de</strong>d region to the right of the<br />

operations. To obtain the total load <strong>for</strong> type fu at a time step t, we sum the<br />

contribution to the load of all operations that are executable on resource type<br />

fu at time step t. In Figure 19-1, the resulting total ad<strong>de</strong>r load profile is shown.<br />

The thick vertical line in the load profile plot is an indicator of the capacity<br />

of the machine’s datapath to execute instructions at a particular time step. (In<br />

the example, we assume a datapath that contains 2 ad<strong>de</strong>rs.) Accordingly, in<br />

step 1 of the load profile, the sha<strong>de</strong>d region to the right of the thick vertical<br />

line represents an over-subscription of the ad<strong>de</strong>r datapath resource. This<br />

indicates that, in the actual VLIW co<strong>de</strong>/schedule, one of the addition operations<br />

(i.e. either operation a, d or g) will need to be <strong>de</strong>layed to a later time<br />

step.<br />

2.2. Load balancing through speculation<br />

We argue that, in the context of VLIW/EPIC machines, the goal of compiler<br />

trans<strong>for</strong>mations aimed at speculating operations should be to “redistribute/<br />

balance” the load profile of the original/input kernel so as to enable a more<br />

effective utilization of the resources available in the datapath. More specifically,<br />

the goal should be to judiciously modify the scheduling ranges of the<br />

kernel’s operations, via speculation, such that overall resource contention is<br />

<strong>de</strong>creased/minimized, and consequently, co<strong>de</strong> per<strong>for</strong>mance improved. We<br />

illustrate this key point using the example CDFG in Figure 2(a), and assuming<br />

a datapath with 2 ad<strong>de</strong>rs, 2 multipliers and 1 comparator. Figure 19-2(b) shows<br />

the ad<strong>de</strong>r load profile <strong>for</strong> the original kernel while Figure 19-3(a) and Figure<br />

19-3(b) show the load profiles <strong>for</strong> the resulting CDFG’s with one and three

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!