Superconducting Technology Assessment - nitrd
Superconducting Technology Assessment - nitrd
Superconducting Technology Assessment - nitrd
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Issues<br />
The HTMT RSFQ-related design work focused on the following issues:<br />
■<br />
■<br />
A multithreaded processor architecture that could tolerate huge disparities between<br />
the projected 50-60 GHz speed of RSFQ processors (called SPELL) and the much slower<br />
non-superconductor memories located outside the cryostat; and<br />
The projected characteristics of the RSFQ superconductor petaflops subsystem consisting<br />
of ~4,000 SPELL processors with a small amount of superconductor memory (called CRAM)<br />
and the superconductor network for inter-processor communication.<br />
Chip Design<br />
The architecture of SPELL processors was designed to support dual-level multithreading with 8-16 multistream units<br />
(MSUs), each of which was capable of simultaneous execution of up to four instructions from multiple threads<br />
running within each MSU and sharing its set of functional units. However, no processor chip design was done for<br />
SPELL processors; their technical characteristics are only estimates based on the best projection of RSFQ circuits<br />
available at that time (1997-1999).<br />
2.2.2 20-GHZ, 8-BIT FLUX-1 MICROPROCESSOR (2000-2002)<br />
The 8-bit FLUX-1 microprocessor was the first RSFQ microprocessor designed and fabricated to address architectural<br />
and design challenges for 20+ GHz RSFQ processors. The FLUX-1 design was done in the framework of the FLUX<br />
project as a collaboration between the SUNY Stony Brook, the former TRW (now Northrop Grumman Space<br />
<strong>Technology</strong>), and the Jet Propulsion Laboratory (NASA).<br />
New Microarchitecture Development<br />
A new communication-aware partitioned microarchitecture was developed for FLUX-1 with the following<br />
distinctive features:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
Ultrapipelining to achieve 20 GHz clock rate with only 2-3 Boolean operations per stage.<br />
Two operations per cycle (40 GOPS peak performance for 8-bit data).<br />
Short-distance interaction and reduced connectivity between Arithmetic Logic Units (ALUs)<br />
and registers.<br />
Bit-streaming, which allows any operation that is dependent on the result of an<br />
operation-in-progress, to start working with the data as soon as its first bit is ready.<br />
Wave pipelining in the instruction memory.<br />
Modular design.<br />
~25 control, integer arithmetic, and logical operations (no load/store operations).<br />
Chips<br />
The final FLUX-1 chip, called FLUX-1R chip, was fabricated in 2002. It had 63,107 Josephson junctions (JJs) on a<br />
10.35 x 10.65 mm 2 die with power consumption of ~ 9.5 mW at 4.5 K.<br />
Operation of a one-bit ALU-register block (the most complex FLUX-1R component) was confirmed by testing.<br />
No operational FLUX-1R chips were demonstrated by the time the project ended in 2002.<br />
29