29.11.2012 Views

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>time</strong> of the loop.<br />

Un<strong>for</strong>tunately, the data needed by a processor is often located on more than one<br />

processing element so that no loop invariants exist <strong>for</strong> optimization. However, because<br />

arrays are often both accessed and distributed in segments of contiguous array cells,<br />

intervals of a loop access data from a single processor and have their own invariants.<br />

Thus, each such interval has its own invariants. By dividing the loop into these intervals,<br />

the code trans<strong>for</strong>mations can still be per<strong>for</strong>med, albeit on a smaller scale.<br />

A compiler can isolate these intervals by per<strong>for</strong>ming a loop trans<strong>for</strong>mation called loop<br />

splitting. <strong>Loop</strong> splitting divides a loop into subloops, which in entirety have the same effect<br />

as the single loop. These subloops can then be reduced in computation.<br />

In the context of distributed memory multiprocessors, this thesis explores the<br />

improvement of array references allowed by the loop splitting trans<strong>for</strong>mation. More<br />

specifically, this paper examines program speedup resulting from loop splitting, the code<br />

trans<strong>for</strong>mations code hoisting and strength reduction, and the subsequent compiler<br />

optimizations.<br />

1.1 Overview<br />

Section 2 describes array management in distributed memory multiprocessors. This topic<br />

includes partitioning of task and data as well as alignment <strong>for</strong> minimal execution <strong>time</strong>.<br />

Then, the method and complexity of array reference expressions are presented to illustrate<br />

the problem this thesis attempts to ameliorate.<br />

Section 3 provides an overview of loop splitting. First, the relevant loop<br />

trans<strong>for</strong>mations (general loop splitting and peeling) and compiler optimizations (code<br />

hoisting and strength reduction) are presented. Next, these elements are brought together<br />

by describing the loop splitting trans<strong>for</strong>mation <strong>for</strong> compiler optimizations. Then, to<br />

prepare <strong>for</strong> Section 4, this section presents the loop splitting framework <strong>for</strong> optimizing<br />

array reference expressions on a distributed memory multiprocessor.<br />

Section 4, the crux of this thesis, describes in detail the loop splitting study. This<br />

includes the methodology and per<strong>for</strong>mance results of several experiments. The results are<br />

10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!