Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Chapter 3<br />
Overview of <strong>Loop</strong> <strong>Splitting</strong><br />
Now that array referencing on multiprocessors has been presented, this section continues<br />
the background presentation by defining the term loop splitting. This includes describing<br />
the loop and code trans<strong>for</strong>mations comprising loop splitting as well as its application to<br />
simplifying array reference expressions.<br />
3.1 Introduction<br />
As supported by Section 2.5, array references can require involved calculations. For<br />
example, in a simple matrix transpose (2-D loop with body A[i][j] = B[j][i]), the assignment<br />
requires 4 each of modulos, divisions, multiplications, and additions. In matrix addition<br />
(A[i][j] = B[i][j] + C[i][j]), of the 6 modulos, 6 divisions, 6 multiplications, and 7 additions,<br />
only one addition was specified by the programmer. All of this array reference overhead<br />
significantly inhibits a loop nest’s execution per<strong>for</strong>mance.<br />
However, loop splitting is an attempt to minimize this overhead by altering the loop<br />
structure. By modifying the intervals of the loop nest and introducing another nesting<br />
level, not only are the array reference operations fewer in number, but they are also no<br />
longer per<strong>for</strong>med on every iteration. This reduction in calculation leads to faster program<br />
execution <strong>time</strong>.<br />
This section presents sufficient descriptions of the components of loop splitting.<br />
Section 3.2 describes the two loop trans<strong>for</strong>mations composing loop splitting, and Section<br />
28