3 years ago

Prediction and Comparison of High-Performance On ... - IEEE Xplore

Prediction and Comparison of High-Performance On ... - IEEE Xplore


ZHANG et al.: PREDICTION AND COMPARISON OF HIGH-PERFORMANCE ON-CHIP GLOBAL INTERCONNECTION 1161Fig. 8. Normalized delay of different global interconnection structures undermin-d objective.Fig. 9. Normalized energy per bit of different global interconnection structuresunder min-d objective.For system simulation and optimization, HSPICE is usedto simulate the transient response of wires, evaluate the performanceof circuit and the entire interconnection structure.Linear and nonlinear regression methods and SQP routineimplemented in MATLAB are adopted to build circuit modelsand perform optimization.In our study, we set the maximum pipeline depth to 20, andchoose 5 mm as the wire length (which represents typical criticallength for on-chip global interconnect) to evaluate and comparethe performance metrics of different structures. Power dissipationis estimated using a PRBS pattern with the activity factoraround 0.23. We also extend each experiment to different wirelengths (0.5, 1, 3, 7, 9 mm) to study the wire length crossingpoints for some representative structure pairs in terms of differentperformance metrics, as shown in Section IV-D.B. LatencyA comparison of the normalized delays of various globalinterconnect structures under the min-d objective is shownin Fig. 8. The trends of normalized latency with technologyscaling verify our previous analysis in Section II-D. Normalizeddelay of R-RC structure increases due to the dominant effect ofon the total latency, whereas latency of P-RC decreasesas the flip-flops dominate the total delay. Therefore, the latencypenalty of pipelining RC wires is alleviated as technologyscales.On the other hand, due to the opposite scaling trend, all T-linestructures outperform R-RC in terms of latency beyond the 90nm node. The single-ended T-lines achieve lowest delay acrossall the five technology nodes. At 22 nm node, all the T-line structuresshow a similar delay around 8 ps/mm, whereas this numberis 60 ps/mm for R-RC and around 90 ps/mm for P-RC. Therefore,a delay reduction of at least 87% could be obtained by replacingglobal RC wires with T-line structures in this scenario.C. Other MetricsUnder the min-d objective, every interconnection schemeshows a decreasing trend in energy dissipation as technologyscales (see Fig. 9), verifying our previous analysis inSection II-D. RC wires consume the largest energy among allsix interconnection structures. Pipelining, under the optimizationcriterion of min-d, increases the energy of R-RC furtherdue to the additional energy consumed by flip-flops, but thisoverhead decreases as technology scales because of the scalingof flip-flop capacitance. On the other hand, T-line structuresconsume less energy at each technology node. Beyond the 65nm node, differential T-lines (UE-TL/PE-TL) consume theleast energy due to power efficient SA-based receivers andthe higher bit rate achieved by reducing signal swing at thewire-end. Further, the energy per bit could be reduced by nearly40% using a passive equalizer. At the 22 nm node, differentialT-line schemes will reduce the energy per bit by two orders ofmagnitude compared with RC wires.The throughput density of different schemes under the min-dobjective is shown in Fig. 10. As discussed in Section II-D, thismetric is improved for all the schemes as technology scales (exceptfor UT-TL, which throughput density is constant, limitedby the wire itself). P-RC achieves the highest throughput densityacross all the technologies by increasing the R-RC bandwidthusing the smallest wire pitch. For T-line schemes, differentialT-lines have larger throughput density compared withsingle-ended ones because of the higher achievable bit rate byutilizing SA-based receiver. Furthermore, the introduction ofpassive equalization makes the utilization of narrower wirespossible, increasing the density even further. Beyond 45 nmnode, differential T-lines will finally outperform R-RC in termsof throughput density.The chip areas consumed by the various interconnect structuresare compared in Fig. 11. According to the analysis performedin Section II-D, assuming wire area is dominant in totalarea consumption, RC wire area will decrease exponentially astechnology scales, whereas area of other T-line schemes will

1162 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011Fig. 10. Throughput density of different global interconnection structuresunder min-d objective.Fig. 12. Critical length of several chosen interconnect structure pairs in termsof different performance metrics under min-d objective.node is about 2.5 mm, which means that when the wire length islarger than 2.5 mm at this node, PE-TL will outperform R-RCin terms of normalized delay. Based on this illustration to understandFig. 12, we make the following general observations.1) As technology scales (beyond the 45 nm node), T-lineschemes will outperform RC wires in terms of normalizeddelay and energy within the entire length range of on-chipglobal wires.2) In terms of throughput, at the 22 nm node, PE-TL will outperformR-RC while wire length is larger than 1 mm, andUT-TL will be replaced by T-TL within the entire lengthrange.3) Single-ended T-lines will consume less chip area comparedwith differential counterparts for longer wire lengths.At the 22 nm node, T-TL occupies less area than PE-TLand UE-TL when the wire length is longer than 5.4 and 4.5mm, respectively.Fig. 11. Chip areas consumed by different global interconnection structuresunder min-d objective.remain the same (single-ended T-lines) or even increase (differentialT-lines), as shown in Fig. 11.D. Critical LengthA critical length study is also performed by running the optimizationflow in several different wire lengths, from 0.5 to 9mm. The results are summarized in Fig. 12. In this figure, adashed line and dotted line located on the upper and lower sidesindicate the upper-bound and lower-bound of wire length foron-chip global interconnects, corresponding to 10 and 0.5 mm,respectively. We chose eight representative interconnect structurepairs, and show the scaling trend of their critical lengths interms of four different performance metrics. As an illustration,for “Delay:PE-TL versus R-RC” case, which corresponds to thesolid line with upper triangle marker, the critical length at 90 nmV. SIGNAL INTEGRITYIn this section, we discuss the signal integrity issues of differentinterconnection structures, with the focus on the T-lineschemes. Basically, we will study signal integrity by simulatingthe maximum crosstalk noise at the wire-end of quiet lines andthe eye-height with and without crosstalk effects. For the maximumnoise simulation, based on the previous work [26] andSPICE simulations, the worst case switching patterns of singleendedand differential T-lines are given in Fig. 13. In terms ofeye-height simulation, HSPICE transient simulations for 500cycle times are performed using one or several different PRBSinput patterns. All the experimental results are summarized asfollows.A. Single-Ended T-LinesSingle-Ended structures tend to be more sensitive to noise.For the unterminated scheme (UT-TL), simulation shows thatthe maximum peak noise will be 380 mV at 45 nm node (1

Novel Packet-Level Resource Allocation with Effective ... - IEEE Xplore
System-level Repeater Requirements and Prediction - SLIP
The Spatial Prediction Comparison Test (SPCT) - RAL
Autodesk® Algor® Simulation Predict product performance.
IEEE 1394-1995 High Performance Serial Bus
tHe center of eXcellence for HiGH-PerforMance ... - KVT-Koenig AG
Building High-Performance Networks with 802.11ac - Xirrus
Performance Prediction for Multimodal Biometrics - IEEE Xplore
Predicting the Performance of Synchronous Discrete ... - IEEE Xplore
Performance comparison of three routing protocols for ... - IEEE Xplore
Performance comparison of two-dimensional discrete ... - IEEE Xplore
Computer prediction of RCS for military targets ... - IEEE Xplore
High Performance Current-Mode Differential Logic - IEEE Xplore
Performance Analysis of Multiuser MIMO Systems with ... - IEEE Xplore
Efficient MIMD Architectures for High-Performance ... - IEEE Xplore
A Multilayer-Based High-Performance Multisubpart ... - IEEE Xplore
A High-Performance VLSI Architecture for the ... - IEEE Xplore
Performance Evaluation of Opportunistic Beamforming ... - IEEE Xplore
On-chip global signaling by wave pipelining ... - IEEE Xplore
optical stability investigation of high performance ... - IEEE Xplore
A Scalable High-Performance Communication Library ... - IEEE Xplore
High Performance Computing for Hyperspectral ... - IEEE Xplore
Predictability of Uplink Channels From Downlink ... - IEEE Xplore
on-road Prediction of Driver's intent with multimodal ... - IEEE Xplore
Model Predictive Control of Systems with Actuator ... - IEEE Xplore
Performance Evaluation of Scheduling in IEEE 802.16 ... - IEEE Xplore
Noise-Predictive Turbo Equalization for Partial ... - IEEE Xplore
A Note on the Prediction Error for Small - IEEE Xplore
A Sensitivity Model for Predicting Photonic Crystal ... - IEEE Xplore