29.11.2012 Views

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Original Generalization<br />

<strong>for</strong>(I step,0,100,10)� Original<strong>Loop</strong><br />

i=I step; A[i] =(i+1)/10 + (i-2)%10; � body;�<br />

i=I step+1; A[i] =(i+1)/10 + (i-2)%10;<br />

I div 10 = I step/10; Interval<strong>Loop</strong> �<br />

I rem 10 = I step%10; Head<strong>Loop</strong><br />

<strong>for</strong>(i,I step+2,I step+9,1) � body;�<br />

� A[i] =Idiv 10 + I rem 10; Body<strong>Loop</strong><br />

I rem 10++;� � optimized body;�<br />

i=I step+9; A[i] =(i+1)/10 + (i-2)%10;� Tail<strong>Loop</strong><br />

� body;� �<br />

Table 3.8: The previous example with correct loop splitting, showing peeled iterations; the<br />

general loop splitting trans<strong>for</strong>mation.<br />

iteration (� aH) uses the value 0 in place of the modulo function, where 0P7IHaV is the<br />

correct value. Because the range of values <strong>for</strong> the modulo in the interval is from 8 to 9 and<br />

then from 0 to 7, the restriction of Section 3.3.2 (a modulo cannot flip to zero in a loop<br />

splitting interval) has been violated. Clearly, the loop splitting interval cannot account <strong>for</strong><br />

the offsets correctly.<br />

One solution is to optimize only expressions without offsets. When many such division<br />

and modulo expressions exist, this option may be appealing, especially if optimizing the<br />

expressions with offsets reintroduces redundant computations by reducing the interval<br />

length. Of course, if all expressions have offsets, none are optimized.<br />

A better, more general solution is to divide the interval into segments of continuous<br />

iterations with common values. In the example of Table 3.7, the ten-iteration interval is<br />

divided into three segments – iterations 1 and 2, iterations 3 through 9, and iteration 10.<br />

The first group of peeled iterations accounts <strong>for</strong> the modulo’s negative offset, and the third<br />

group accounts <strong>for</strong> the division’s positive offset. Table 3.8 shows the same loop with the<br />

first, second, and last iterations peeled away. All iterations now have the correct value,<br />

though only the middle segment has been optimized.<br />

In general, many different offset values may exist, leading to the appearance of many<br />

small segments at both ends of the original interval. This thesis groups these little<br />

segments into two segments, one representing the maximum positive offset, and the other<br />

representing the minimum negative offset. Because they contain different sets of values,<br />

36

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!