30.12.2012 Views

Superconducting Technology Assessment - nitrd

Superconducting Technology Assessment - nitrd

Superconducting Technology Assessment - nitrd

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The microarchitecture of superconductor processors must support a truly localized computation model in order<br />

to have processor functional units fed with data from registers located in very close proximity to the units. Such<br />

organization makes distributed, multi-bank register structures much more suited for superconductor processors<br />

than the monolithic multi-ported register files used in CMOS microprocessors. An example of such a partitioned<br />

architecture for integer processing was developed for the 20 GHz FLUX-1 microprocessor, which could be called<br />

processing-in-registers.<br />

2.4.4 MEMORY HIERARCHY FOR SUPERCONDUCTOR<br />

PROCESSORS AND SYSTEMS<br />

Efficient memory hierarchy design including capacity, latency, and bandwidth for each of its levels is one of<br />

the most important architectural issues. Some memory issues have been addressed in simulation and low-speed<br />

experiments, but there are only analytical results for multi-technology memory hierarchy for a petaflops system<br />

with superconductor processors.<br />

The principal design issues are:<br />

34<br />

■ <strong>Technology</strong> choices for each level of memory hierarchy.<br />

■ Interfacing between different memory levels.<br />

■ Latency avoidance/tolerance mechanisms in the processor microarchitecture.<br />

Currently, there are several technologies that should be studied as potential candidates for inclusion into such a memory hierarchy:<br />

■ RSFQ.<br />

■ JJ-CMOS.<br />

■ SFQ-MRAM.<br />

■ Semiconductor SRAM and DRAM (at higher temperatures, outside the cryostat).<br />

RSFQ Memory<br />

■ Traditionally-designed RSFQ RAM (e.g., the FLUX-1 instruction memory) rely on point-to-point<br />

implementation of the bit and word lines with tree structures containing RSFQ asynchronous<br />

elements such as splitters and mergers at each node of these trees. Such design has a very negative<br />

effect on both density and latency.<br />

■ Currently, the fastest type of purely RSFQ memory is a first-in-first-out (FIFO)-type, shift<br />

register memory, which has demonstrated high-speed (more than 50 GHz), good density,<br />

and low latency at the expense of random access capabilities. This type of memory can<br />

be efficiently used to implement vector registers, which makes vector/streaming architectures<br />

natural candidates for consideration for superconductor processors.<br />

JJ-CMOS Memory<br />

■ The closest to RSFQ memory in terms of speed, but much higher in density, is hybrid JJ-CMOS<br />

memory, for which simulation results show latencies of few hundred picoseconds for 64 kbit<br />

memory chips. A 50 GHz processor with a reasonably large off-chip hybrid JJ-CMOS memory<br />

on the same MCM will need architectural mechanisms capable of tolerating 30-40 cycle<br />

memory access latencies.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!