Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
on a uniprocessor. In removing the uniprocessor, scalable multiprocessors also remove the<br />
single memory address space. Array referencing must not only consider where in memory<br />
an array is located, but also in which processor’s memory, thus requiring more calculation.<br />
Virtual memory can eliminate the additional addressing calculations by providing the<br />
abstraction of a uni<strong>for</strong>m, linear address space. However, this abstraction results in an<br />
efficiency loss because an unnecessarily large number of memory references may need to be<br />
satisfied by a remote node.<br />
The remainder of this section describes in more detail what has been outlined above.<br />
First, Section 2.2 presents task and data partitioning, while introducing terms relevant to<br />
loop splitting. Then, Section 2.3 describes the procedure <strong>for</strong> determining the optimal task<br />
and data partitions. Section 2.4 describes the problem with array referencing on a<br />
distributed memory multiprocessor. Finally, Section 2.5 examines a method of<br />
multiprocessor array referencing, the complex expressions involved, and the possibility of<br />
reducing these expressions.<br />
2.2 Partitioning<br />
While a uniprocessor executes an entire program, a processing element (PE) in a<br />
distributed memory multiprocessor executes only a portion of the program. Dividing a<br />
program into portions <strong>for</strong> the PEs is called partitioning, of which there are two types –<br />
task partitioning and data partitioning.<br />
2.2.1 Task Partitioning<br />
Per<strong>for</strong>med <strong>for</strong> both distributed and shared memory multiprocessors, task partitioning<br />
assigns a portion of work to each processor such that the sum of all processor tasks is<br />
equivalent to the original single task. Two types of task partitioning exist: dynamic and<br />
static. Dynamic task partitioning is determined at the <strong>time</strong> of execution, where processes<br />
spawn other processes until all processors are busy. This is achieved by <strong>for</strong>k operations,<br />
which are later joined to merge the results of two processors onto one processor. The <strong>for</strong>k<br />
and join operations are well-suited to recursive algorithms.<br />
14