30.12.2012 Views

Superconducting Technology Assessment - nitrd

Superconducting Technology Assessment - nitrd

Superconducting Technology Assessment - nitrd

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Execution Models<br />

Organizing hardware for an RSFQ-based supercomputer is one thing, but how do we organize processing and<br />

software to maximize computational throughput? The widely used approach of manually structuring software to<br />

use message passing communications is far from optimal; many applications use synchronizing “barrier” calls that<br />

force processors to sit idle until all have entered the barrier. Similarly, the overlapping of computation with<br />

communications is often inefficient when done manually.<br />

Execution models are the logical basis for maximizing throughput, while system hardware architecture provides the<br />

physical resources to support the execution model. Hardware architectures not tied to an execution model are<br />

unlikely to support optimal throughput; conversely, execution models that ignore hardware constraints are unlikely<br />

to be efficiently implemented. Mainstream parallel computers tend towards a message-passing, SPMD (single<br />

program, multiple data) execution model and hardware architecture with a homogeneous set of processing nodes<br />

connected by a regularly structured communications fabric. (The physical structure of the fabric is not visible to the<br />

programmer.) This execution model was developed to handle systems with tens to hundreds of processing nodes,<br />

but computational efficiencies decrease as systems grow to thousands of nodes. Much of the inefficiency appears<br />

due to communications fabrics with insufficient bandwidth.<br />

There have been a number of machines developed to explore execution model concepts. Some of these—most<br />

notably the Thinking Machines CM-1 and the Tera MTA — were developed for commercial use; unfortunately, both<br />

of these machines were successful in achieving their technical goals but not their commercial ones.<br />

The HTMT study is noteworthy in that the conceptual execution model was developed to match the hardware<br />

constraints described above, with the added notion that multi-threading was a critical element of an execution<br />

model. The conventional SPMD approach could not be mapped to the hardware constraints, nor could<br />

multi-threading alone define the execution model. The solution was to reverse the conventional memory access<br />

paradigm: instead of issuing a memory request and waiting for a response during a calculation, the HTMT execution<br />

model “pushes” all data and code needed for a calculation into the very limited memory of the high-speed processors<br />

in the form of “parcels.” Analytical studies of several applications have demonstrated that the execution model<br />

should give good computational throughput on a projected petaflops hardware configuration.<br />

While little data is yet publicly available, both the IBM and Cray HPCS efforts appear to be co-developing execution<br />

models and hardware architectures. The IBM PERCS project is “aggressively pursuing hardware/software co-design”<br />

in order to contain ultimate system cost. The Cray Cascade project incorporates an innovative hardware architecture<br />

in which “lightweight” processors feed data to “heavyweight” processors; the execution model being developed<br />

has to effectively divide workload between the lightweight and heavyweight processors. Extensive use of simulation<br />

and modeling is being carried out to minimize the risk associated with adopting an innovative architecture.<br />

While there is some research into execution models for distributed execution (Mobile Agent models, for example),<br />

there is little research on execution models for large-scale supercomputers. The Japanese supercomputing efforts<br />

(NEC and Fujitsu) have focused on engineering excellence rather than architecture innovation. The GRAPE<br />

sequence of special-purpose supercomputers uses an execution model designed to optimize a single equation on<br />

which N-body simulations depend.<br />

164

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!