29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Compiler-Directed ILP Extraction 253<br />

tive phase. First it <strong>de</strong>ci<strong>de</strong>s on the next best candidate <strong>for</strong> speculation. The<br />

ranking function used during this phase has already been <strong>de</strong>scribed in <strong>de</strong>tail<br />

in Section 3. The simplest <strong>for</strong>m of speculating the selected operation is by<br />

<strong>de</strong>leting the edge from its pre<strong>de</strong>cessor predicate <strong>de</strong>fine operation (<strong>de</strong>noted<br />

predicate promotion [6]). In certain cases, the ability to speculate requires<br />

renaming and creating a new successor predicated move operation <strong>for</strong> reconciliation<br />

(<strong>de</strong>noted SSA-PS [9]).<br />

After speculation is done, binding is per<strong>for</strong>med using the binding algorithm<br />

<strong>de</strong>scribed above. The next optimization phase applies the critical trans<strong>for</strong>mation<br />

of collapsing binding related move operations with reconciliation<br />

related predicated moves, see [9]. Finally a modulo scheduler schedules the<br />

resulting Data Flow Graph (DFG). A two level priority function that ranks<br />

operations first by lower alap and next by lower mobility is used by the<br />

modulo scheduler.<br />

If execution latency is improved with respect to the previous best result,<br />

then the corresponding schedule is saved. Each new iteration produces a<br />

different binding function that consi<strong>de</strong>rs the modified scheduling ranges<br />

resulting from the operation speculated in the previous iteration. The process<br />

continues iteratively, greedily speculating operations, until the termination<br />

condition is satisfied. Currently this condition is simply a threshold on the<br />

number of successfully speculated operations, yet more sophisticated termination<br />

conditions can very easily be inclu<strong>de</strong>d.<br />

Since the estimation of cluster loads as well as the binding algorithm<br />

<strong>de</strong>pend on the assumed profile latency, we found experimentally that it was<br />

important to search over different such profile latencies. Thus, the iterative<br />

process is repeated <strong>for</strong> various profile latency values, starting from the ASAP<br />

latency of the original CDFG and incrementing it upto a given percentage of<br />

the critical path (not exceeding four steps).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!