Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Original Generalization<br />
<strong>for</strong>(I step,0,100,10)� Original<strong>Loop</strong><br />
i=I step; A[i] =(i+1)/10 + (i-2)%10; � body;�<br />
i=I step+1; A[i] =(i+1)/10 + (i-2)%10;<br />
I div 10 = I step/10; Interval<strong>Loop</strong> �<br />
I rem 10 = I step%10; Head<strong>Loop</strong><br />
<strong>for</strong>(i,I step+2,I step+9,1) � body;�<br />
� A[i] =Idiv 10 + I rem 10; Body<strong>Loop</strong><br />
I rem 10++;� � optimized body;�<br />
i=I step+9; A[i] =(i+1)/10 + (i-2)%10;� Tail<strong>Loop</strong><br />
� body;� �<br />
Table 3.8: The previous example with correct loop splitting, showing peeled iterations; the<br />
general loop splitting trans<strong>for</strong>mation.<br />
iteration (� aH) uses the value 0 in place of the modulo function, where 0P7IHaV is the<br />
correct value. Because the range of values <strong>for</strong> the modulo in the interval is from 8 to 9 and<br />
then from 0 to 7, the restriction of Section 3.3.2 (a modulo cannot flip to zero in a loop<br />
splitting interval) has been violated. Clearly, the loop splitting interval cannot account <strong>for</strong><br />
the offsets correctly.<br />
One solution is to optimize only expressions without offsets. When many such division<br />
and modulo expressions exist, this option may be appealing, especially if optimizing the<br />
expressions with offsets reintroduces redundant computations by reducing the interval<br />
length. Of course, if all expressions have offsets, none are optimized.<br />
A better, more general solution is to divide the interval into segments of continuous<br />
iterations with common values. In the example of Table 3.7, the ten-iteration interval is<br />
divided into three segments – iterations 1 and 2, iterations 3 through 9, and iteration 10.<br />
The first group of peeled iterations accounts <strong>for</strong> the modulo’s negative offset, and the third<br />
group accounts <strong>for</strong> the division’s positive offset. Table 3.8 shows the same loop with the<br />
first, second, and last iterations peeled away. All iterations now have the correct value,<br />
though only the middle segment has been optimized.<br />
In general, many different offset values may exist, leading to the appearance of many<br />
small segments at both ends of the original interval. This thesis groups these little<br />
segments into two segments, one representing the maximum positive offset, and the other<br />
representing the minimum negative offset. Because they contain different sets of values,<br />
36