15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Message passing systems, such as the Transputer [May87], have no shared memory but handle communications<br />

using message passing. This can cause high latency while waiting for requested data; however,<br />

each processor can hold multiple threads, and may be able to occupy itself while waiting for remote data.<br />

Deadlocks are a problem.<br />

Another variation of memory management is cache only memory access (COMA). Memory is distributed,<br />

but only held in cache [Saulsbury95]. The Kendall Square machine [KSR91] has this organization.<br />

On the KSM distributed memory is held in the cache of each processor, which is connected by a<br />

ring. The caches of remote processors are accessed using this ring.<br />

Vector Machines<br />

A vector machine creates a series of functional units and pumps a stream of data through the series. Each<br />

stage of the pipe will store its resulting data in a vector register, which will be read by the next stage. In<br />

this way the parallelism is equal to the number of stages in the pipeline. This is very efficient if the same<br />

functions are to be preformed on a long stream of data. The Cray series computer [Cray92] is famous<br />

for this technique. It is becoming popular to make an individual processor of a MIMD system a vector<br />

processor.<br />

Dataflow Machine<br />

The von Neumann approach to computing has one control state in existence at one time. A program<br />

counter is used to point to the single next instruction. This approach is used in traditional machines,<br />

and is also used in most of the single processors of the multiple processor systems described earlier. A<br />

completely different approach was developed at the Massachusetts Institute of Technology [Dennis91,<br />

Arvind90, Polychronopoulos89]. They realized that the maximum amount of parallelism could be realized<br />

if at any one point all instructions that are ready to execute were executed. An instruction is ready<br />

to execute if the data that is required for its complete execution is available. Therefore, execution of an<br />

instruction is not governed by the sequential order, but by its readiness to execute, that is, when both<br />

operands are available. A table is kept of the instructions that are about ready to execute, that is, one of<br />

the two operands needed for the assembly language level instruction is available. When the second operand<br />

is found, this instruction is executed. The result of the execution is passed to a control unit, which will<br />

select a set of new instructions to be about ready to execute, or mark an instruction as ready (because<br />

the second operand needed has arrived).<br />

This approach yields the maximum amount of parallelism. However, it runs into problems with “run<br />

away execution.” Too many instructions may be about ready, and clog the system. It is a fascinating<br />

approach, and machines have been developed. It has the advantage that no, or very little, changes need<br />

to be made to old dusty decks to extract parallelism. Steps can be made to avoid “run away execution.”<br />

Out of Order Execution Concept<br />

An approach similar to the dataflow concept is called out of order execution [Cintra00]. Here again,<br />

program elements that are ready to execute may be executed. It has a big advantage when multiple<br />

functional units are available on the same CPU, but the functional units have different latency values.<br />

The technique is not completely new but similar to issuing a load instruction, which has high latency,<br />

well before the result of the load is required. By the time the load is completed the code has reached the<br />

location where it is used. Also a floating point instruction, again a class of instructions with high latency,<br />

is frequently started before integer instructions coded to execute first are executed. By the time the floating<br />

point is complete, its results are ready to be used. The compiler can make this decision statically. In out<br />

of order execution the hardware has more of a role in the decision of what to execute. This may include<br />

both the then and the else parts of an if statement. They can both be executed, but not be committed to<br />

until the correct path is determined. This technique is also called speculative execution. Any changes that<br />

have been made by a wrong path must be capable of being rolled back. Although this may seem to be extra<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!