21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

194 P. Raghavan et al.<br />

optimiz<strong>in</strong>g applications for a given architecture, designers try to m<strong>in</strong>imize the energy<br />

consumption by improv<strong>in</strong>g other metrics like a reduction <strong>in</strong> the number of memory<br />

accesses. This <strong>in</strong>direct way is however <strong>in</strong>conclusive for more complex trade-offs, like<br />

<strong>in</strong>troduc<strong>in</strong>g extra operations, and therefore accesses to the <strong>in</strong>struction memory, <strong>in</strong> order<br />

to m<strong>in</strong>imize accesses to the data memory. To correctly perform this type of optimizations,<br />

an <strong>in</strong>tegrated energy-aware estimation flow is needed.<br />

Decisions at different levels of abstraction have an impact on the efficiency of the<br />

f<strong>in</strong>al implementation, from algorithmic level choices to source level transformations and<br />

all the way down to micro-architectural changes. In this paper, we present an <strong>in</strong>tegrated<br />

compilation and architecture exploration framework with fast performance and energy<br />

estimation. This work enables designers to evaluate the impact of various optimizations<br />

<strong>in</strong> terms of energy and performance. The optimizations explored can be either <strong>in</strong> the<br />

source code, <strong>in</strong> the compiler or <strong>in</strong> the architecture.<br />

Current embedded platforms consist of a number of processors, custom hardware and<br />

memories. Because of <strong>in</strong>creas<strong>in</strong>g production costs, flexibility is gett<strong>in</strong>g more important<br />

and platforms have to be used for many different and evolv<strong>in</strong>g products. In this work<br />

we focus on one of the programmable processors, potentially <strong>in</strong>clud<strong>in</strong>g special purpose<br />

Functional Units (FUs), that improve the energy efficiency for a certa<strong>in</strong> application<br />

doma<strong>in</strong>. The data and <strong>in</strong>structions memory hierarchy of this processor are taken <strong>in</strong>to<br />

account.<br />

In this context designers are fac<strong>in</strong>g multiple problems. Firstly, given a set of target<br />

applications, the Instruction Set Architecture (ISA, decides on number and type of FUs),<br />

the processor style (correct mix of <strong>in</strong>struction (ILP) and data level parallelism (DLP)),<br />

the usage of application specific accelerator units, sizes of memories and register files<br />

and the connectivity between these components have to be fixed. In order to reach the<br />

required performance and energy efficiency, the retargetable tool-flow presented here<br />

will enable a fast architecture exploration and lead to better processor designs, tak<strong>in</strong>g<br />

<strong>in</strong>to account all parts of the system. Our framework will correctly identify the energy<br />

and performance bottlenecks and prevent designers from improv<strong>in</strong>g one part at the cost<br />

of other parts. S<strong>in</strong>ce our framework allows the use of novel low power extensions for<br />

different components of the processor, comb<strong>in</strong>ations of these extensions can be explored.<br />

Secondly, after the processor has been fixed, architecture dependent software<br />

optimizations us<strong>in</strong>g code transformations can dramatically improve the performance<br />

and energy efficiency. An example of such a transformation is loop merg<strong>in</strong>g. This technique<br />

can improve data locality, but can have the adverse effect of <strong>in</strong>creas<strong>in</strong>g register<br />

pressure, thereby caus<strong>in</strong>g register spill<strong>in</strong>g. Our framework will guide the designer to<br />

choose these transformations. It directly shows the effect of software-optimizations on<br />

the f<strong>in</strong>al platform metrics: cycles and Joules. Thirdly, compiler optimizations like improved<br />

schedul<strong>in</strong>g and allocation techniques can be evaluated for a range of relevant<br />

state of the art architectures. Their effect on different parts of the system (e.g. register<br />

files, memories and datapath components) can be tracked correctly.<br />

Optimiz<strong>in</strong>g this hardware-software co-design problem is complicated by the large<br />

size of the design space. In order to be of practical use, estimation tools should be<br />

sufficiently fast to handle realistic application sizes. Currently, energy estimation dur<strong>in</strong>g<br />

processor design is done us<strong>in</strong>g time consum<strong>in</strong>g gate level simulations. This approach

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!