Superconducting Technology Assessment - nitrd
Superconducting Technology Assessment - nitrd
Superconducting Technology Assessment - nitrd
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
CNET<br />
Crossbar<br />
DRAM-PIM<br />
16 TB<br />
Since the demise of the HTMT project, processor-in-memory technology has advanced as a result of the HPCS<br />
program; the other technologies—RSFQ processors and memory, optical Data Vortex, and holographic storage—<br />
have languished due to limited funding.<br />
Latency, Bandwidth, Parallelism<br />
Attaining high performance in large parallel systems is a challenge. Communications must be balanced with computation;<br />
with insufficient bandwidth, nodes are often stalled while waiting for input. Excess bandwidth would be<br />
wasted, but this is rarely a practical problem: as the number of “hops” between nodes increases, so does the bandwidth<br />
consumed by each message. The result is that aggregate system bandwidth should increase not linearly with<br />
numbers of nodes but as N logN (Clos, hypercube) or N 2 (toroidal mesh) to sustain the same level of random<br />
node-to-node messages per node. Large systems tend to suffer from insufficient bandwidth from the typical<br />
application perspective.<br />
Physical size is another limitation. Burton Smith likes to cite Little’s Law:<br />
latency x bandwidth = concurrency<br />
HSP HSP HSP HSP<br />
CRAM<br />
SRAM<br />
P M<br />
HTMT BLOCK DIAGRAM<br />
Compute Intensive — Flow Control Driven<br />
CRAM<br />
SRAM<br />
CRAM<br />
SRAM<br />
in communications systems which transport messages from input to output without either creating or destroying<br />
them. High bandwidth contributes to high throughput: thus, high latencies are tolerated only if large numbers of<br />
messages can be generated and processed concurrently. In practice, there are limits to the degree of concurrency<br />
supportable by a given application at any one time. Low latency is desirable, but latency is limited by speed-of-light<br />
considerations, so the larger the system, the higher the latency between randomly selected nodes. As a result,<br />
applications must be capable of high degrees of parallelism to take advantage of physically large systems.<br />
CRAM<br />
SRAM<br />
P M P M P M<br />
Data Vortex Network<br />
P M P M P M P M P M<br />
Smart Main Memory<br />
Data Intensive — Data Flow Driven<br />
Latencies<br />
Inter-HSP<br />
400-1000 cycles<br />
Intra-execution<br />
pipeline<br />
10 - 100 cycles<br />
to CRAM<br />
40 - 400 cycles<br />
to SRAM<br />
400 - 1000 cycles<br />
to DRAM<br />
10,000 - 40,000<br />
cycles<br />
DRAM to HRAM<br />
1x10 6 -4x10 6 cycles<br />
163