29.11.2012 Views

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

Compile-time Loop Splitting for Distributed Memory ... - Stanford AI Lab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

on a uniprocessor. In removing the uniprocessor, scalable multiprocessors also remove the<br />

single memory address space. Array referencing must not only consider where in memory<br />

an array is located, but also in which processor’s memory, thus requiring more calculation.<br />

Virtual memory can eliminate the additional addressing calculations by providing the<br />

abstraction of a uni<strong>for</strong>m, linear address space. However, this abstraction results in an<br />

efficiency loss because an unnecessarily large number of memory references may need to be<br />

satisfied by a remote node.<br />

The remainder of this section describes in more detail what has been outlined above.<br />

First, Section 2.2 presents task and data partitioning, while introducing terms relevant to<br />

loop splitting. Then, Section 2.3 describes the procedure <strong>for</strong> determining the optimal task<br />

and data partitions. Section 2.4 describes the problem with array referencing on a<br />

distributed memory multiprocessor. Finally, Section 2.5 examines a method of<br />

multiprocessor array referencing, the complex expressions involved, and the possibility of<br />

reducing these expressions.<br />

2.2 Partitioning<br />

While a uniprocessor executes an entire program, a processing element (PE) in a<br />

distributed memory multiprocessor executes only a portion of the program. Dividing a<br />

program into portions <strong>for</strong> the PEs is called partitioning, of which there are two types –<br />

task partitioning and data partitioning.<br />

2.2.1 Task Partitioning<br />

Per<strong>for</strong>med <strong>for</strong> both distributed and shared memory multiprocessors, task partitioning<br />

assigns a portion of work to each processor such that the sum of all processor tasks is<br />

equivalent to the original single task. Two types of task partitioning exist: dynamic and<br />

static. Dynamic task partitioning is determined at the <strong>time</strong> of execution, where processes<br />

spawn other processes until all processors are busy. This is achieved by <strong>for</strong>k operations,<br />

which are later joined to merge the results of two processors onto one processor. The <strong>for</strong>k<br />

and join operations are well-suited to recursive algorithms.<br />

14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!