15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

38. S. Wallace, B. Calder, and D. Tullsen, “Threaded Multiple Path Execution,” Proceedings of the 25th<br />

Annual International Symposium on Computer Architecture, pp. 238–249, 1998.<br />

39. “The National Technology Roadmap for Semiconductors,” Semiconductor Industry Association,<br />

1997.<br />

5.5 Survey of Parallel Systems<br />

Donna Quammen<br />

Introduction<br />

Computers have long been considered “a solution looking for a problem,” but because of limits found<br />

by complexity theory and limits on computing power some problems that were presented could not be<br />

solved. Multimedia problems, image processing and recognition, AI application, and weather prediction<br />

may not be accomplished unless processing power is increased. There are many varieties of parallel<br />

machines, each has the same goal, to complete a task quickly and inexpensively. Modern physics has<br />

continually increased the speed and capacity of the media on which modern computer chips are housed,<br />

usually VLSI, and at the same time decreased the price. The challenge of the computer engineers is to<br />

use the media effectively. Different components may be addressed to accomplish this, such as, but not<br />

limited to:<br />

• Functionality of the processors—floating point, integer, or high level function, etc.<br />

• Topology of the network which interconnects the processors<br />

• Instruction scheduling<br />

• Position and capability of any master control units that direct the processors<br />

• Memory address space<br />

• Input/output features<br />

• Compilers and operating systems support to make the parallel system accessible<br />

• Application’s suitability to a particular parallel system<br />

• Algorithms to implement the applications<br />

As can be imagined there is an assortment of chooses for each of these components. This provides for<br />

the possibility of a large variety of parallel systems. Plus more chooses and variations are continually<br />

being developed to utilize the increased capacity of the underlining media.<br />

Mike Flynn, in 1972 [Flynn72], developed a classification for various parallel systems, which has remained<br />

authoritative. It is based on the number of instruction streams and the number of data streams active in<br />

one cycle. A sequential machine is considered to have single instruction stream executing on a single data<br />

stream;<br />

this is called SISD.<br />

An SIMD machine has a single instruction stream executing on multiple data<br />

streams in the same cycle. MIMD has multiple instruction streams executing on multiple data streams simultaneously.<br />

All are shown in Fig. 5.18. An MISD is not shown but is considered to be a systolic array.<br />

Four categories of MIMD systems, dataflow, multithreaded, out of order execution, and very long instruction<br />

words (VLIW) , are of particular interest, and seem to be the tendency for the future. These categories<br />

can be applied to a single CPU, providing parallelism by having multiple functional units.<br />

All four attempt<br />

to use fine-grain parallelism to maximize the number of instructions that may be executing in the same<br />

cycle. They also use fine-grain parallelism to assist in utilizing cycles, which possibly could be lost due to<br />

large latency in the execution of an instruction. Latency increases when the execution of one instruction<br />

is temporarily staled while waiting for some resource currently not available, such as the results of a cache<br />

miss, or even a cache fetch, the results of a floating point instruction (which takes longer than a simpler<br />

instruction), or the availability of a needed functional unit. This could cause delays in the execution of<br />

other instructions. If there is very fine grain parallelism, other instructions can use available resources<br />

while the staled instruction is waiting. This is one area where much computing power has been reclaimed.<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!