Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
aref (aref (A, div-a-i), rem-a-i);<br />
aref (aref (B, div-b-i + (i-n * div-b-j)),<br />
rem-b-i + (i-spread * rem-b-j));<br />
aref (aref (C, div-c-i + i-proc * (div-c-j + (j-proc * div-c-k))),<br />
rem-c-i + i-spread * (rem-c-j + j-spread * rem-c-k));<br />
Figure 2-8: The same references with all the divisions replaced by interval invariants and<br />
all modulos replaced by an incrementing counter.<br />
aref (aref (B, pid-B-ij), rem-b-i + (i-spread * rem-b-j));<br />
aref (aref (C, pid-C-ijk), rem-c-i + i-spread * (rem-c-j + j-spread * rem-c-k);<br />
Figure 2-9: The 2-D and 3-D reference expressions after further compiler optimization.<br />
each reference.<br />
An optimizing compiler could even further reduce the calculation by replacing the<br />
multiplication by � ����—�in the second reference (� ����—�in the third reference) with an<br />
addition of � ����—�(� ����—�) on each iteration. This type of optimization, called strength<br />
reduction, is described in Section 3.3.2.<br />
In the context of rectangular partitioning, “sufficiently small” interval values are<br />
those that keep the loop nest occupied with a single processor. If this condition is met, the<br />
processor ID is constant, and the offset into the processor’s memory can be determined<br />
with monotonically increasing or decreasing counters. Thus, appropriate intervals can<br />
Number of Operations Per Array Reference<br />
Reference Be<strong>for</strong>e Optimization After Optimization<br />
mod 4 2 C mod 4 2 C<br />
1-D 1 1 0 0 0 0 0 0<br />
2-D 2 2 2 2 0 0 1 1<br />
3-D 3 3 4 4 0 0 2 2<br />
Table 2.3: Reduction in operations <strong>for</strong> array references by an optimizing compiler.<br />
26