Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Ten 1-D <strong>Loop</strong> Nests One 2-D <strong>Loop</strong> Nest<br />
<strong>for</strong>(i,0,10,1) � body(i);� <strong>for</strong>(I step,0,100,10) �<br />
<strong>for</strong>(i,10,20,1) � body(i);� <strong>for</strong>(i,I step,I step+10,1)<br />
<strong>for</strong>(i,20,30,1) � body(i);� � body(i);� �<br />
...<br />
<strong>for</strong>(i,90,100,1) � body(i);�<br />
Table 3.2: Ten separate loops of equal iteration length can be reduced in code to two nested<br />
loops.<br />
Peeled Iterations Grouped Into <strong>Loop</strong>s<br />
body(i=0);<br />
body(i=1); <strong>for</strong>(i,0,2,1) � body(i);�<br />
<strong>for</strong>(i,2,97,1) � body(i);� <strong>for</strong>(i,2,97,1) � body(i);�<br />
body(i=97); <strong>for</strong>(i,97,100,1) � body(i);�<br />
body(i=98);<br />
body(i=99);<br />
Table 3.3: The original loop with a prologue and epilogue both peeled away and grouped into<br />
loops themselves.<br />
subloops in entirety. For example, the code on the left of Table 3.2, comprised of ten<br />
subloops of equal length (10 iterations), can be represented by the much smaller code on<br />
the right. With this definition of loop splitting, code expansion is no longer a concern, and<br />
the extra loop overhead is negligible provided the subloops have a moderate iteration size.<br />
3.2.2 Peeling<br />
This thesis defines peeling as the removal of a small number of iterations from either end<br />
of a loop’s iteration interval. These peeled iterations result in a prologue (the first few<br />
iterations) and an epilogue (the last few iterations) around the modified loop. Table 3.3<br />
shows the original loop of Table 3.1 with two iterations peeled into a prologue and three<br />
peeled into an epilogue. Of course, if the number of peeled iterations is large enough, loop<br />
overhead may be favored over code expansion so that the prologue and epilogue become<br />
loops themselves, as Table 3.3 also illustrates.<br />
For the remainder of this thesis, the term loop splitting will refer not to the general<br />
trans<strong>for</strong>mation of Section 3.2.1, but specifically to the combination of general loop splitting<br />
and peeling to allow <strong>for</strong> parallel compiler optimizations, which Section 3.4 describes.<br />
30