Views
2 years ago

Prediction and Comparison of High-Performance On ... - IEEE Xplore

Prediction and Comparison of High-Performance On ... - IEEE Xplore

Prediction and Comparison of High-Performance On ... - IEEE

1154 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011Prediction and Comparison of High-PerformanceOn-Chip Global InterconnectionYulei Zhang, Student Member, IEEE, Xiang Hu, Student Member, IEEE, Alina Deutsch, Fellow, IEEE,A. Ege Engin, Member, IEEE, James F. Buckwalter, Member, IEEE, and Chung-Kuan Cheng, Fellow, IEEEAbstract—As process technology scales, numerous interconnectschemes have been proposed to mitigate the performance degradationcaused by the scaling of on-chip global wires. In this paper, wereview current on-chip global interconnect structures and developsimple models to analyze their architecture-level performance. Wepropose a general framework to design and optimize a new categoryof global interconnect based on on-chip transmission line(T-line) technology. We perform a group of experiments using sixdifferent global interconnection structures to discover their differencesin terms of latency, energy per bit, throughput, area, andsignal integrity over several technology nodes. Our results showthat T-line structures have the potential to outperform conventionalrepeated RC wires at future technology nodes to achievehigher performance while using less power and improving the reliabilityof wire communication. Our results also show that on-chipequalization is helpful to improve throughput, signal integrity, andpower efficiency.Index Terms—On-chip global interconnect, passive equalization,performance prediction, transmission line.I. INTRODUCTIONAS semiconductor technology advances in the ultra deepsub-micrometer (UDSM) era, on-chip global interconnecthas become an ever-greater barrier to acheiving the performancerequirements of increasingly large system-on-chip (SoC) designs.Shrinking of wire geometries results in greater per-unitlengthresistance. Even with a shrinking dielectric constant, anincreasing RC delay per unit wire length is observed as technologyscales. Meanwhile, the average length of global wires,determined by chip size, remains fixed as technology scales dueto increasingly compelex SoC designs. According to the ITRSroadmap [1], the RC delay of 1-mm-long, minimum pitch globalwire will be 542 ps at the 45 nm node, while the 10 level fan-outManuscript received October 09, 2009; revised February 04, 2010; acceptedMarch 26, 2010. First published May 10, 2010; current version published June24, 2011. This work was supported by NSF CCF-0811794 and California DiscoveryProgram.Y. Zhang, X. Hu, and J. F. Buckwalter are with the Department of Electricaland Computer Engineering, University of California-San Diego, La Jolla, CA92037 USA (e-mail: y1zhang@ucsd.edu).A. Deutsch is with IBM T. J. Watson Research Center, Yorktown Heights,NY 10598 USA.A. E. Engin is with the Department of Electrical and Compute Engineering,San Diego State University, San Diego, CA 92182 USA.C.-K. Cheng is with the Department of Computer Science and Engineering,University of California-San Diego, La Jolla, CA 92037 USA.Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TVLSI.2010.2047415of 4 (FO4) delay of minimum sized inverter will be 145 ps at thesame node. A substantial performance gap is growing betweenglobal interconnect and logic gates.Global wires also consume a significant portion of the totalpower in digital systems. In [2], Magen et al. found that interconnectpower accounted for half the total dynamic power ina 0.13- m microprocessor designed for power efficiency. Further,nearly one-third of the total power was dissipated in globalwires, comprising global clocks and signals. The widely-usedrepeated RC wire structure for global interconnect requires significantpower overhead because it uses strong repeaters to driverelatively short wire segments [3]. As shown in [4], to minimizetotal latency, the optimal repeated structure has equal amountsof wire and gate capacitance, which means that half the total dynamicpower is dissipated in repeaters.To break the “interconnect wall” caused by the scaling ofglobal wires, many approaches have been proposed to hastenon-chip global communication. The repeater insertion methodhas been widely adopted [5]. By breaking the long wire intosegments and adding buffers, the repeater insertion method reducestotal wire delay at the cost of additional power overhead.To further reduce latency and energy per bit, transmission-line(T-line) effects of on-chip wires have been utilized by adoptingfat top-layer wires driven by low impedance transmitters [6].However, the inter-symbol interference (ISI) due to the resistiveloss severely limits the bandwidth of such T-line schemes. 1 Tocounter ISI and increase throughput density, equalization techniqueshave been employed [7]. Different approaches have beenproposed using passive [8], [9] or active components [7], [10]to build equalized T-line structures for high-throughput on-chipglobal communication.In this work, six global interconnection structures areexplored and their performance compared across multiple technologynodes. Extending a previously published conferencepaper [11], we add the following features: 1) as a means toimprove the throughput of repeated RC wires, pipelined RCwire is analyzed and compared with other global interconnectstructures; 2) chip areas consumed by different global interconnectstructures are modeled and discussed; and 3) wire lengthis added as a new variable in performance models to captureand study the critical length of different interconnect schemesin terms of specific performance metrics.The rest of this paper is organized as follows. In Section II,the various global interconnect structures are introduced in1 In this work, bandwidth of interconnect is defined as the highest signal frequencythat the whole interconnection system can support in order to meet specificvoltage swing requirement (e.g., full-swing for repeated RC wire and minimumdetectable voltage for T-line schemes) at the receiver side.1063-8210/$26.00 © 2010 IEEE

Novel Packet-Level Resource Allocation with Effective ... - IEEE Xplore
The Spatial Prediction Comparison Test (SPCT) - RAL
System-level Repeater Requirements and Prediction - SLIP
Autodesk® Algor® Simulation Predict product performance.
IEEE 1394-1995 High Performance Serial Bus
Building High-Performance Networks with 802.11ac - Xirrus
tHe center of eXcellence for HiGH-PerforMance ... - KVT-Koenig AG
Performance Prediction for Multimodal Biometrics - IEEE Xplore
Predicting the Performance of Synchronous Discrete ... - IEEE Xplore
Performance comparison of three routing protocols for ... - IEEE Xplore
Performance comparison of two-dimensional discrete ... - IEEE Xplore
Computer prediction of RCS for military targets ... - IEEE Xplore
Performance Analysis of Multiuser MIMO Systems with ... - IEEE Xplore
Performance Evaluation of Opportunistic Beamforming ... - IEEE Xplore
High Performance Current-Mode Differential Logic - IEEE Xplore
A Multilayer-Based High-Performance Multisubpart ... - IEEE Xplore
A High-Performance VLSI Architecture for the ... - IEEE Xplore
Efficient MIMD Architectures for High-Performance ... - IEEE Xplore
High Performance Computing for Hyperspectral ... - IEEE Xplore
optical stability investigation of high performance ... - IEEE Xplore
A Scalable High-Performance Communication Library ... - IEEE Xplore
On-chip global signaling by wave pipelining ... - IEEE Xplore
Predictability of Uplink Channels From Downlink ... - IEEE Xplore
Customer behavior prediction - it's all in the timing ... - IEEE Xplore
Intrinsically Multivariate Predictive Genes - IEEE Xplore
Predicting the Probability of Change in Object ... - IEEE Xplore
State-Predictive Control of an Autonomous Blimp in ... - IEEE Xplore
Predicting Fingertip Forces by Imaging Coloration ... - IEEE Xplore
Performance Evaluation of Scheduling in IEEE 802.16 ... - IEEE Xplore