15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

one entrance but may have several exits. This creates a long stream of code, which would normally be<br />

executed sequentially, and allows the compiler to choose instructions over this larger range. Roll-back<br />

code must be added to the hyperblock’s exit to undo the affects of superfluous code that was executed,<br />

but would not execute sequentially. Loops are frequently unrolled, several iterations considered as straight<br />

code, to form hyperblocks. Branch prediction can also help create beneficial hyperblocks.<br />

A newer approach is to dynamically pack the VLIW, using a preprocessor that accesses multiple queues,<br />

one for each functional unit. Realize that using queues is similar to out of order execution.<br />

Interconnection Network<br />

An interconnection network is a necessary component of all parallel processing systems. Several features<br />

govern the choice of a network. A scalable interconnection network for parallel processor would be ideal<br />

if it meets the following requirements for a large range of system sizes. For instance, it may be scalable<br />

by reasonably small increments from 2 4 to perhaps 2 20 processors.<br />

• Have a low average and maximum diameter (the distance between the furthest two nodes) to<br />

avoid communication latency.<br />

• Minimize routing constraints (have many routes from A to B).<br />

• Have a constant number of I/O port (channels) per node to allow for expansion without retrofit.<br />

• Have a simple wire layout to allow for expansion and to avoid wasting VLSI space.<br />

• Be inherently fault tolerant.<br />

• Be sub-dividable for disjoint multiuser applications.<br />

• Be able to handle a large range of algorithms without undo overhead.<br />

The most popular parallel networks—hypercube, quad tree, fat-tree, binary tree, mesh, and torus—fail<br />

in one or more of these items.<br />

Meshes have a major disadvantage: they lack support for long distance connections. Hypercubes have<br />

excellent connectivity by guaranteeing a maximum distance between any two nodes of log N where N is<br />

the number of nodes. Also, many paths exist between any two nodes making it fault tolerant and amenable<br />

to low contention. However, the number of I/O ports per node is log N. As a system scales, each node<br />

would need to be retrofit to add additional ports. In addition, the wire layout is complex, making this<br />

network expensive in space. Tree structures are popular and have a maximum long distance connection of<br />

O(2 log N). Communications on a tree, however, can be complicated by the fact that although many<br />

neighbors are close, many can only communicate through the root. This causes contention at the root. Fattrees<br />

reduce this contention by increasing the bandwidth, as the network approaches the root [Leiserson85].<br />

The extreme ideal network would allow all nodes to connect to all others. This is not practical for<br />

large system. However, one class of networks, the multistage network, uses an internal switching system<br />

and allows constant access time between any two nodes. Several arrangements are available for multistage<br />

networks. All are similar. One such network, the baseline network, is shown in Fig. 5.19. It can be proved<br />

that a N-by-N network can be totally connected with log N switches steps [Seigel89]. Each step is through<br />

a row of switches, with N/2 switches in each row. This requires quite a bit of hardware and does allow<br />

for a connection in log N steps. This is not very scalable.<br />

Optical technology [Yuan97] has shown to be promising for the implementation of all networks.<br />

Instead of a wire, an optical “beam” is used to make the connection. This is fast and has several advantages.<br />

One, a broadcast can be made from one node to many nodes at once (although, one node cannot receive<br />

many inputs at once), and two, the transmission can be sent through clear space.<br />

Conclusion<br />

Parallel systems are going to become almost universal in computer systems. Desktop computers are now<br />

frequently being delivered with more than one CPU and definitely with more than one functional unit.<br />

The new models of SUN, MIPS, Intel, and Macintosh desktop computers currently are providing parallel<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!