VLSI I: von Architektur zu hochintegrierter Schaltung und FPGA

Institut für Integrierte SystemeIntegrated Systems LaboratoryDepartment of Information Technology and Electrical EngineeringVLSI I: von Architektur zu hochintegrierter Schaltung und FPGA227-0116-00LExercise 6Architecture Design and Block DiagramsProf. Dr. H. KaeslinDr. N. FelberSVN Rev.: 1122Last Changed: 02-05-2014Reminder:With the execution of this training you declare that you understand and accept the regulations about using CAE/CADsoftware installations at the ETH Zurich. These regulations can be read anytime athttp://dz.ee.ethz.ch/regulations/index.en.html.

IntroductionIn this exercise, architectural transformations will be applied to a basic finite impulse response (FIR) filter in order to getseveral architecture variants that differ in cost and performance. We are then going to make simple cost analyses usingpen and paper, which will help us to derive first order estimates of the complexity and expected performance of the differentarchitectures. At the end of this exercise, a block diagram and a schedule of the preferred solution are prepared as startingpoints for register transfer level (RTL) synthesis in the next computer exercise.1. Taking advantage of architectural transforms to tailor a circuit architectureA FIR filter of order N is defined by the equationy[k] =N∑b n · x[k − n] ,n=0where k denotes consecutive, discrete timesteps. We assume that the input data x[k] and the coefficients b i are in 16 bittwo’s complement format, and that the coefficients b i are time-independent. Note that we will not worry about the detailedI/O timing in this exercise (i.e. no input or output registers have to be considered during the analysis).In the first part of the exercise, we will start with the isomorphic architecture which will then serve as a basis for variousarchitectural transforms.In order to estimate hardware cost figures such as circuit size and maximum path length, refer to appendix A where thekey parameters of some common circuit blocks are given. Note that this will result in a first order estimate which may notbe completely accurate - the main goal here is to compare the different architectures.To calculate the various figures of merit, assume that the input data is registered outside of the FIR block, and that carrylookaheadadders are being used. We neglect the area requirements for storing the coefficients. Assume that the result ofthe multiplication is truncated to 16 bit.Student Task 1:1. Isomorphic architecturea) Draw a data dependency graph (DDG) of the most straightforward implementation for a 3 rd order FIR filter(N = 3). This solution will be referred to as the isomorphic architecture throughout this exercise.b) Calculate the circuit size A, maximum path length t lp , latency L, cycles needed per data item Γ, and thethroughput Θ of this implementation a . Use the figures given in appendix A.(Hint: Do not forget to include the time needed by the registers.)c) Calculate the same characteristics for a FIR filter of order N.2. Iterative decompositiona) By means of iterative decomposition you can achieve a minimum hardware solution for a 3 rd order FIRfilter. Draw the DDG for this architecture. You may summarize the control section in one block.b) Calculate the characteristics (A, t lp , L, Γ, Θ) for this architecture and for a general FIR filter of order N.3. Pipelining Ia) We can cut down the maximum path length by introducing pipeline registers. Taking the isomorphic architectureas starting point, develop a pipelined architecture where each stage contains at most one arithmeticelement, and draw its DDG.b) Calculate the characteristics (A, t lp , L, Γ, Θ) for this architecture and for a general FIR filter of order N.4. Retiminga) Again starting from the isomorphic architecture, we can relocate functional registers by means of retimingand chain reversal. Keep in mind that this transform is easy only for constant filter coefficients. Draw aDDG for this architecture.b) Calculate the characteristics (A, t lp , L, Γ, Θ) for this architecture and for a filter order N.2

5. Pipelining IIa) We can now introduce pipeline registers into the retimed architecture, such that each stage contains nomore than one arithmetic unit. Draw the DDG for this.b) Calculate the characteristics (A, t lp , L, Γ, Θ) for this architecture and for a general filter of order N.c) Compare the results with those obtained in 3b. The large difference between the two pipelined architecturesgives you an impression of the importance of architectural considerations in VLSI design.6. Optional: Gate-level RetimingWhat is the problem with retiming at RTL level? Can you explain how the timing of a circuit might be improvedeven further? Suggest an effective way to boost the throughput of architecture Pipelining II. What are the limitingfactors of this technique?aThese figures are introduced in section 3.3.6 “Relative merits of architectural alternatives” of VLSI 1 lecture notes.2. Further steps towards writing a synthesis modelIn the forthcoming exercise, you are going to write an RTL model for an FIR filter. You are looking out for a fairly economicdesign with only one hardware multiplier, one adder and one LUT. Execution time is allowed to grow linearly with filter orderN. Latency is of secondary importance.The key specifications are as follows:• Filter order N = 128,• input data width w inp = 18 bit,• coefficient width w b = 12 bit,• summation width w sum = 30 bit, and• output data width w oup = 18 bit.Note that all numbers are in signed two’s complement format.Student Task 2:1. Consider the basic architectural solutions obtained so far. What option do you go for and why?2. How and at what cost would it be possible to improve throughput further?3. Block diagramDraw an RTL-level diagram of your architecture that includes not only the datapath, but also the necessaryauxiliary functions. Indicate the bit-widths of signals that are wider than 1 bit.4. ScheduleIn preparation of HDL modeling, devise a schedule that lists clock cycle by clock cycle• the operations being carried out,• the data being held by the various registers, and• the values of major data and control signals.An asynchronous reset mechanism is always a good idea (e.g. for initializing a circuit after power-up) and youare advised to include such a mechanism in your synthesis code at a later stage.5. Optional: ParametrizationSo far, you have been instructed to design a filter circuit for a fixed order N. What would it take to make thehardware reusable for a wider range of filter orders N = 2...128?E Discuss your results with an assistant. E3

A. Area and delay figures of selected functions in VLSISelect functionsFunction A t pd C inµm 2 ns fF2-to-1 MUX 1x drive 11.5 0.10 1.54-to-1 MUX 1x drive 32.0 0.14 2Table 1: Parameters of select-logic functions (UMC L130).Arithmetic functionsThe figures of table 2 do include the approximate area occupied by inter-cell wiring as estimated by Synopsys DesignCompiler. Synthesis results have been obtained by instantiating the appropriate Designware component followed byoptimization with no timing constraints.Structure Word A t pdwidth µm 2 nsAdd (ripple-carry) 16 450 2.54Add (carry-lookahead) 16 3600 0.38Add/Sub (carry-lookahead) 16 3126 0.59Multiply (unpipelined, carry-save) 16x16 20’493 2.36Table 2: Two’s complement addition/subtraction and multiplication functions (UMC L130).Bistable storage functionsFunction A t pd t su C inµm 2 ns ns fFD Flip-Flop with no reset 1x drive 25.6 0.16 0.07 1.4D Flip-Flop with reset 1x drive 28.2 0.17 0.06 1.4Table 3: Selected storage functions: Flip-flops (UMC L130).4

VLSI I: von Architektur zu hochintegrierter Schaltung und FPGA

Create successful ePaper yourself

Delete template?

Save as template?