29.11.2012 Views

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Ten 1-D <strong>Loop</strong> Nests One 2-D <strong>Loop</strong> Nest<br />

<strong>for</strong>(i,0,10,1) � body(i);� <strong>for</strong>(I step,0,100,10) �<br />

<strong>for</strong>(i,10,20,1) � body(i);� <strong>for</strong>(i,I step,I step+10,1)<br />

<strong>for</strong>(i,20,30,1) � body(i);� � body(i);� �<br />

...<br />

<strong>for</strong>(i,90,100,1) � body(i);�<br />

Table 3.2: Ten separate loops of equal iteration length can be reduced in code to two nested<br />

loops.<br />

Peeled Iterations Grouped Into <strong>Loop</strong>s<br />

body(i=0);<br />

body(i=1); <strong>for</strong>(i,0,2,1) � body(i);�<br />

<strong>for</strong>(i,2,97,1) � body(i);� <strong>for</strong>(i,2,97,1) � body(i);�<br />

body(i=97); <strong>for</strong>(i,97,100,1) � body(i);�<br />

body(i=98);<br />

body(i=99);<br />

Table 3.3: The original loop with a prologue and epilogue both peeled away and grouped into<br />

loops themselves.<br />

subloops in entirety. For example, the code on the left of Table 3.2, comprised of ten<br />

subloops of equal length (10 iterations), can be represented by the much smaller code on<br />

the right. With this definition of loop splitting, code expansion is no longer a concern, and<br />

the extra loop overhead is negligible provided the subloops have a moderate iteration size.<br />

3.2.2 Peeling<br />

This thesis defines peeling as the removal of a small number of iterations from either end<br />

of a loop’s iteration interval. These peeled iterations result in a prologue (the first few<br />

iterations) and an epilogue (the last few iterations) around the modified loop. Table 3.3<br />

shows the original loop of Table 3.1 with two iterations peeled into a prologue and three<br />

peeled into an epilogue. Of course, if the number of peeled iterations is large enough, loop<br />

overhead may be favored over code expansion so that the prologue and epilogue become<br />

loops themselves, as Table 3.3 also illustrates.<br />

For the remainder of this thesis, the term loop splitting will refer not to the general<br />

trans<strong>for</strong>mation of Section 3.2.1, but specifically to the combination of general loop splitting<br />

and peeling to allow <strong>for</strong> parallel compiler optimizations, which Section 3.4 describes.<br />

30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!