29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 19<br />

COMPILER-DIRECTED ILP EXTRACTION FOR<br />

CLUSTERED VLIW/EPIC MACHINES<br />

Satish Pillai and Margarida F. Jacome<br />

The University of Texas at Austin, USA<br />

Abstract. Compiler-directed ILP extraction techniques are critical to effectively exploiting the<br />

significant processing capacity of contemporaneous VLIW/EPIC machines. In this paper we<br />

propose a novel algorithm <strong>for</strong> ILP extraction targeting clustered EPIC machines that integrates<br />

three powerful techniques: predication, speculation and modulo scheduling. In addition, our<br />

framework schedules and binds operations to clusters, generating actual VLIW co<strong>de</strong>. A key innovation<br />

in our approach is the ability to per<strong>for</strong>m resource constrained speculation in the context<br />

of a complex (phased) optimization process. Specifically, in or<strong>de</strong>r to enable maximum utilization<br />

of the resources of the clustered processor, and thus maximize per<strong>for</strong>mance, our algorithm<br />

judiciously speculates operations on the predicated modulo scheduled loop body using a set of<br />

effective load based metrics. Our experimental results show that by jointly consi<strong>de</strong>ring different<br />

extraction techniques in a resource aware context, the proposed algorithm can take<br />

maximum advantage of the resources available on the clustered machine, aggressively improving<br />

the initiation interval of time critical loops.<br />

Key words: ILP extraction, clustered processors, EPIC, VLIW, predication, modulo scheduling,<br />

speculation, optimization<br />

1. INTRODUCTION<br />

Multimedia, communications and security applications exhibit a significant<br />

amount of instruction level parallelism (ILP). In or<strong>de</strong>r to meet the per<strong>for</strong>mance<br />

requirements of these <strong>de</strong>manding applications, it is important to use compilation<br />

techniques that expose/extract such ILP and processor datapaths with<br />

a large number of functional units, e.g., VLIW/EPIC processors.<br />

A basic VLIW datapath might be based on a single register file shared by<br />

all of its functional units (FUs). Un<strong>for</strong>tunately, this simple organization does<br />

not scale well with the number of FUs [12]. Clustered VLIW datapaths address<br />

this poor scaling by restricting the connectivity between FUs and registers,<br />

so that an FU on a cluster can only read/write from/to the cluster’s register<br />

file, see e.g. [12]. Since data may need to be transferred among the machine’s<br />

clusters, possibly resulting in increased latency, it is important to <strong>de</strong>velop<br />

per<strong>for</strong>mance enhancing techniques that take such data transfers into consi<strong>de</strong>ration.<br />

Most of the applications allu<strong>de</strong>d to above have only a few time critical<br />

245<br />

A Jerraya et al. (eds.), <strong>Embed<strong>de</strong>d</strong> <strong>Software</strong> <strong>for</strong> SOC, 245–259, 2003.<br />

© 2003 Kluwer Aca<strong>de</strong>mic Publishers. Printed in the Netherlands.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!