Superconducting Technology Assessment - nitrd
Superconducting Technology Assessment - nitrd
Superconducting Technology Assessment - nitrd
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Execution Models<br />
Organizing hardware for an RSFQ-based supercomputer is one thing, but how do we organize processing and<br />
software to maximize computational throughput? The widely used approach of manually structuring software to<br />
use message passing communications is far from optimal; many applications use synchronizing “barrier” calls that<br />
force processors to sit idle until all have entered the barrier. Similarly, the overlapping of computation with<br />
communications is often inefficient when done manually.<br />
Execution models are the logical basis for maximizing throughput, while system hardware architecture provides the<br />
physical resources to support the execution model. Hardware architectures not tied to an execution model are<br />
unlikely to support optimal throughput; conversely, execution models that ignore hardware constraints are unlikely<br />
to be efficiently implemented. Mainstream parallel computers tend towards a message-passing, SPMD (single<br />
program, multiple data) execution model and hardware architecture with a homogeneous set of processing nodes<br />
connected by a regularly structured communications fabric. (The physical structure of the fabric is not visible to the<br />
programmer.) This execution model was developed to handle systems with tens to hundreds of processing nodes,<br />
but computational efficiencies decrease as systems grow to thousands of nodes. Much of the inefficiency appears<br />
due to communications fabrics with insufficient bandwidth.<br />
There have been a number of machines developed to explore execution model concepts. Some of these—most<br />
notably the Thinking Machines CM-1 and the Tera MTA — were developed for commercial use; unfortunately, both<br />
of these machines were successful in achieving their technical goals but not their commercial ones.<br />
The HTMT study is noteworthy in that the conceptual execution model was developed to match the hardware<br />
constraints described above, with the added notion that multi-threading was a critical element of an execution<br />
model. The conventional SPMD approach could not be mapped to the hardware constraints, nor could<br />
multi-threading alone define the execution model. The solution was to reverse the conventional memory access<br />
paradigm: instead of issuing a memory request and waiting for a response during a calculation, the HTMT execution<br />
model “pushes” all data and code needed for a calculation into the very limited memory of the high-speed processors<br />
in the form of “parcels.” Analytical studies of several applications have demonstrated that the execution model<br />
should give good computational throughput on a projected petaflops hardware configuration.<br />
While little data is yet publicly available, both the IBM and Cray HPCS efforts appear to be co-developing execution<br />
models and hardware architectures. The IBM PERCS project is “aggressively pursuing hardware/software co-design”<br />
in order to contain ultimate system cost. The Cray Cascade project incorporates an innovative hardware architecture<br />
in which “lightweight” processors feed data to “heavyweight” processors; the execution model being developed<br />
has to effectively divide workload between the lightweight and heavyweight processors. Extensive use of simulation<br />
and modeling is being carried out to minimize the risk associated with adopting an innovative architecture.<br />
While there is some research into execution models for distributed execution (Mobile Agent models, for example),<br />
there is little research on execution models for large-scale supercomputers. The Japanese supercomputing efforts<br />
(NEC and Fujitsu) have focused on engineering excellence rather than architecture innovation. The GRAPE<br />
sequence of special-purpose supercomputers uses an execution model designed to optimize a single equation on<br />
which N-body simulations depend.<br />
164