11.07.2015 Views

ieee transactions on very large scale integration vlsi - Computer ...

ieee transactions on very large scale integration vlsi - Computer ...

ieee transactions on very large scale integration vlsi - Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAUDHURI, ET AL: A SOLUTION METHODOLOGY FOR EXACT DESIGN SPACE EXPLORATION IN A 3D DESIGN SPACE 11ASAP 1*,2+ 2*,2+ 3*,3+Clock Cs ns Cs ns Cs ns Cs ns163 14 2282 16 2608 16 2608 14 228282 17 1394 21 1722 18 1476 17 139455 20 1100 29 1595 22 1210 21 115548 23 1104 LB 1632 26 1248 25 120041 34 1394 LB 1599 LB 1230 LB 118933 37 1221 LB 1617 LB 1221 LB 119028 40 1120 LB 1596 44 1232 42 117624 43 1032 66 1584 48 1152 46 110421 57 1197 LB 1596 LB 1155 LB 111319 60 1140 LB 1596 LB 1159 LB 1121TABLE VIIIEWF { RCS-3D Results Without ChainingASAP 1*,2+ 2*,2+ 3*,3+Clock Cs ns Cs ns Cs ns Cs ns211 7 1477 14 2954 14 2954 10 2110163 9 1467 15 2445 15 2445 10 163096 12 1152 20 1920 16 1536 13 1248TABLE IXEWF { RCS-3D Results for Type I Chainingoverall faster schedule than the fastest unchained schedule.C. Discrete Cosine Transform (DCT)The TCS-3D results for the DCT are presented in TableXII. The rst set of results are for a time c<strong>on</strong>straint of500ns, corresp<strong>on</strong>ding to a design will run at 2MHz. Eightclock lengths produced feasible schedules, but <strong>on</strong>ly <strong>on</strong>e {24ns { led to the minimum number of functi<strong>on</strong>al units.To nd the fastest possible design, the critical path lengthwas used to derive the tightest possible time c<strong>on</strong>straint of434ns, and <strong>on</strong>ly <strong>on</strong>e clock length { 24ns { led to a feasibleschedule and thus to the optimal 3D schedule.The RCS-3D results for the DCT are presented in TableXIII. In the absence of chaining, the 56ns clock length(d sub ) corresp<strong>on</strong>ds to the fastest schedule. This time, not<strong>on</strong>ly could type I chaining not nd an overall faster schedulethan the fastest unchained schedule, but it could noteven improve the schedule for a given clock length over theunchained schedule, probably due to the severe resourcec<strong>on</strong>straints.D. Methodology Run TimesVoyager's design space explorati<strong>on</strong> methodologies c<strong>on</strong>sistsof three main tasks: computing the minimal set of candidateclock lengths, computing tight bounds <strong>on</strong> the numberof functi<strong>on</strong>al units or <strong>on</strong> the schedule length, and solvingthe TRCS problem. The minimal set of candidate clocklengths can be computed quickly, and the bounds can becomputed by solving at most two linear programs in polynomialtime, as discussed in Secti<strong>on</strong>s V and VI. Finally,the TRCS formulati<strong>on</strong> used in Voyager is well-structured,meaning that it c<strong>on</strong>verges <strong>on</strong> the optimal soluti<strong>on</strong> fasterthan an arbitrary formulati<strong>on</strong>.To motivate the need for solving the TCS or RCS problemby rst computing bounds and then solving the re-ASAP 1*,2+ 2*,2+ 3*,3+Clock Cs ns Cs ns Cs ns Cs ns106 11 1166 19 2014 15 1590 12 127271 17 1207 LB 1775 19 1349 LB 127853 20 1060 LB 1643 LB 1219 LB 1166TABLE XEWF { RCS-3D Results for Type II ChainingASAP 1*,2+ 2*,2+ 3*,3+Clock Cs ns Cs ns Cs ns Cs ns163 9 1467 15 2445 15 2445 10 1630106 11 1166 19 2014 15 1590 13 137896 12 1152 20 1920 16 1536 13 124882 17 1394 21 1722 18 1476 LB 139471 17 1207 LB 1775 19 1349 LB 142055 20 1100 29 1595 22 1210 21 115553 20 1060 LB 1643 LB 1219 LB 1166TABLE XIEWF { RCS-3D Results for Type III Chainingsulting TRCS problem, c<strong>on</strong>sider the result of solving theTCS problem directly for a time c<strong>on</strong>straint of 1394ns anda 24ns clock <strong>on</strong> the EWF benchmark. Even with a wellstructuredformulati<strong>on</strong> suchasVoyager's, solving this problemdirectly took over an hour of CPU time (using LINDO<strong>on</strong> a Sun SPARCstati<strong>on</strong> 2). In c<strong>on</strong>trast, we spent <strong>on</strong>ly 1.51sec to compute the lower bounds <strong>on</strong> the number of functi<strong>on</strong>alunits, and <strong>on</strong>ly 7.75 sec to solve the TRCS problem{ solving the same problem in two orders of magnitude lesstime!On a <strong>large</strong>r benchmark { the DCT { for a time c<strong>on</strong>straintof 500ns and a 24ns clock, we spent 8.28 sec to compute thelower bounds <strong>on</strong> the number of functi<strong>on</strong>al units, and 2.62sec to solve the TRCS problem. Again, directly solving theTCS problem for this case took over an hour.In general, the best designs for each example were generatedwithin sec<strong>on</strong>ds. However, for <strong>very</strong> small clock lengths(e.g. 19ns), the ILP for the TRCS problem becomes quite<strong>large</strong>, and in some cases would have taken hours to nd theexact soluti<strong>on</strong>. Fortunately, even in those cases the boundswere produced fairly quickly, and could often obviate theneed to solve the TRCS problem for those clock lengths asdescribed in Secti<strong>on</strong>s II-A and II-B.VIII. Summary and Future WorkThis paper has dened a new problem { the 3D schedulingproblem { and has presented an exact soluti<strong>on</strong> methodologyto solve that problem without resorting to a timec<strong>on</strong>sumingexhaustive search. This soluti<strong>on</strong> methodologyis exact {itisguaranteed to nd the optimal clock lengthand schedule. Furthermore, it is ecient {itprunes inferiorpoints in the design space through a careful selecti<strong>on</strong>of candidate clock lengths (an important design parametertoo often determined by guesswork or estimates), andthrough tight bounds <strong>on</strong> the number of functi<strong>on</strong>al units orthe length of the schedule. It can optimally solve mediumsizedproblems in sec<strong>on</strong>ds, as opposed to more c<strong>on</strong>venti<strong>on</strong>altechniques that might require hours. Thus it eliminates the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!