11.07.2015 Views

ieee transactions on very large scale integration vlsi - Computer ...

ieee transactions on very large scale integration vlsi - Computer ...

ieee transactions on very large scale integration vlsi - Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 5, NO. 1, MARCH 19973 Mult, 3 Add 2 Mult, 2 Add 1 Mult, 2 AddClock Csteps ns Csteps ns Csteps ns163 14 2282 16 2608 16 260882 17 1394 18 1476 21 172255 21 1155 22 1210 29 159548 25 1200 26 1248 37 177624 46 1104 48 1152 66 1584TABLE IResource-C<strong>on</strong>strained Scheduling Results for the EWFchoice eliminates an entire dimensi<strong>on</strong> of the search space,so even an optimal scheduler will explore <strong>on</strong>ly the corresp<strong>on</strong>ding2D slice of the design space, and will produce aschedule that is optimal <strong>on</strong>ly for that <strong>on</strong>e clock length. Abetter schedule may exist for a dierent clock length, butwill not be found.To motivate the need to explore this 3D design space,c<strong>on</strong>sider the problem of scheduling the well-known EllipticWave Filter [6, p.206] (EWF) benchmark, under a varietyof resource c<strong>on</strong>straints, to nd the fastest possible schedule.Assume that the VDP100 module library [7], [8] is used,which hasamultiplicati<strong>on</strong> delay of 163ns, and an additi<strong>on</strong>delay of 48ns.Forced to select a clock length for the scheduling algorithm,the designer would probably choose either a clocklength of 48ns or 163ns { the executi<strong>on</strong> delay of either additi<strong>on</strong>or multiplicati<strong>on</strong>. Given those clock lengths, an optimalscheduler that supports multi-cycle operati<strong>on</strong>s (suchas the ILP-based scheduler [9] in our Voyager design spaceexplorati<strong>on</strong> system) would produce the results shown <strong>on</strong>the rows labeled \48" and \163" in Table I.Now c<strong>on</strong>sider the other rows of Table I, which representother, perhaps less obvious, choices for the clock length.For each resource c<strong>on</strong>straint, the fastest design corresp<strong>on</strong>dsto a clock length of 24ns { a design that would not be foundby ascheduling methodology limited by ad hoc guesses. 2Thus it is important to explore a number of candidate clocklengths to nd the globally optimal soluti<strong>on</strong>.B. Exploring the 3D Design SpaceAvariety of methodologies can be used for design spaceexplorati<strong>on</strong>. The methodologies may use exact algorithmsto nd optimal soluti<strong>on</strong>s, may use heuristic algorithms t<strong>on</strong>d lower and upper bounds <strong>on</strong> the optimal soluti<strong>on</strong>, ormay use heuristic algorithms to estimate the optimal soluti<strong>on</strong>.In general, the tradeo between these three types ofmethodologies is <strong>on</strong>e of soluti<strong>on</strong> qualityversus computati<strong>on</strong>ti<strong>on</strong> [3], or reclocking [4] to determine the nal clock length. However,these techniques generally do not change the relative schedulingof the operati<strong>on</strong>s, and do not perform tradeos involving resourcesharing, so they do not explore the high-level design space as fully asscheduling techniques. Nevertheless, the later use of these techniques,possibly in c<strong>on</strong>juncti<strong>on</strong> with other transformati<strong>on</strong>s [5], can serve asavaluable complement to our methodologies.2 This small clock length also results in a <strong>large</strong>r numberofc<strong>on</strong>trolsteps, and thus a <strong>large</strong>r and more complex c<strong>on</strong>trol unit. However, notethat a clock length of 55ns { more comparable to the ad hoc guesses {results in a schedule almost as fast as the <strong>on</strong>e corresp<strong>on</strong>ding to a 24nsclock, and faster than those corresp<strong>on</strong>ding to the ad hoc guesses.Exhaustive Search:read in DFG, module library, and any c<strong>on</strong>straintsfor each clock lengthoptimally schedule the DFGpresent the best result(s) to the user for evaluati<strong>on</strong>Fig. 3. Exhaustive Search of the 3D Design Space (Impractical)time. This paper is c<strong>on</strong>cerned with nding optimal (exact)soluti<strong>on</strong>s to the scheduling problem in the 3D design space.One exact methodology for optimally solving this 3Dschedulingproblem shown in Figure 3. This methodologyexhaustively explores all potential clock lengths and all feasibleschedules, and guarantees a globally optimal soluti<strong>on</strong>.Unfortunately, the computati<strong>on</strong> time for this methodologyis too high to be practical for all but the simplest examples.In c<strong>on</strong>trast, this paper presents a more ecient exactmethodology, implemented in the Voyager design space explorati<strong>on</strong>system, for optimally solving this 3D-schedulingproblem. This methodology makes the problem tractablethrough: (1) careful pruning of provably inferior pointsfrom the design space, and (2) provably ecient exact algorithmsfor solving the individual problems.However, even this soluti<strong>on</strong> methodology is <strong>on</strong>ly the rststep toward the <strong>large</strong>r design space explorati<strong>on</strong> problemthat eventually needs to be solved. As described here,our methodology does not c<strong>on</strong>sider the module selecti<strong>on</strong>or type mapping problems, and does not support loops orc<strong>on</strong>diti<strong>on</strong>als 3 . It also does not incorporate register, wiring,or c<strong>on</strong>troller area, and <strong>on</strong>ly partially incorporates the delaysassociated with the c<strong>on</strong>troller and wiring. Nevertheless,the work described here can serve as a foundati<strong>on</strong>for an exact soluti<strong>on</strong> methodology that incorporates eachof these factors, either by adding extra dimensi<strong>on</strong>s to thesearch space or by adding other stages to the methodology.II. Methodology OverviewThis paper presents two methodologies to solve the clockdeterminati<strong>on</strong> and scheduling problem, that are guaranteedto nd the globally optimal design, and that are far moreecient than an exhaustive search of the design space. Onemethodology solves the Time-C<strong>on</strong>strained 3D Scheduling(TCS-3D) problem (Figure 4), while the other solves theResource-C<strong>on</strong>strained 3D Scheduling (RCS-3D) problem(Figure 5). Both methodologies are implemented in Rensselaer'sVoyager design space explorati<strong>on</strong> system.The core of each methodology is based roughly <strong>on</strong> the exhaustivesearch of Figure 3. Each methodology computesa set of candidate clock lengths, and then, for each candidateclock length, optimally solves the scheduling problem.However, a straightforward implementati<strong>on</strong> of thiscore methodology takes much too l<strong>on</strong>g to solve, even forsmall benchmarks. Thus it is important to (1) solve thescheduling problem for <strong>on</strong>ly a small, provably minimal setof candidate clock lengths, and (2) solve the scheduling3 However, module selecti<strong>on</strong> has since been incorporated [10], andc<strong>on</strong>diti<strong>on</strong>als and register area are currently being investigated.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!