Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>time</strong> of the loop.<br />
Un<strong>for</strong>tunately, the data needed by a processor is often located on more than one<br />
processing element so that no loop invariants exist <strong>for</strong> optimization. However, because<br />
arrays are often both accessed and distributed in segments of contiguous array cells,<br />
intervals of a loop access data from a single processor and have their own invariants.<br />
Thus, each such interval has its own invariants. By dividing the loop into these intervals,<br />
the code trans<strong>for</strong>mations can still be per<strong>for</strong>med, albeit on a smaller scale.<br />
A compiler can isolate these intervals by per<strong>for</strong>ming a loop trans<strong>for</strong>mation called loop<br />
splitting. <strong>Loop</strong> splitting divides a loop into subloops, which in entirety have the same effect<br />
as the single loop. These subloops can then be reduced in computation.<br />
In the context of distributed memory multiprocessors, this thesis explores the<br />
improvement of array references allowed by the loop splitting trans<strong>for</strong>mation. More<br />
specifically, this paper examines program speedup resulting from loop splitting, the code<br />
trans<strong>for</strong>mations code hoisting and strength reduction, and the subsequent compiler<br />
optimizations.<br />
1.1 Overview<br />
Section 2 describes array management in distributed memory multiprocessors. This topic<br />
includes partitioning of task and data as well as alignment <strong>for</strong> minimal execution <strong>time</strong>.<br />
Then, the method and complexity of array reference expressions are presented to illustrate<br />
the problem this thesis attempts to ameliorate.<br />
Section 3 provides an overview of loop splitting. First, the relevant loop<br />
trans<strong>for</strong>mations (general loop splitting and peeling) and compiler optimizations (code<br />
hoisting and strength reduction) are presented. Next, these elements are brought together<br />
by describing the loop splitting trans<strong>for</strong>mation <strong>for</strong> compiler optimizations. Then, to<br />
prepare <strong>for</strong> Section 4, this section presents the loop splitting framework <strong>for</strong> optimizing<br />
array reference expressions on a distributed memory multiprocessor.<br />
Section 4, the crux of this thesis, describes in detail the loop splitting study. This<br />
includes the methodology and per<strong>for</strong>mance results of several experiments. The results are<br />
10