Master of Engineering Balram Sahu - Embedded Sensing ...

"Minimum Energy Point Operation of ASICCircuits"A PROJECT REPORT SUBMITTEDIN PARTIAL FULFILMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMaster of EngineeringIn TheFaculty Of EngineeringByBalram SahuGuided ByProf. Bharadwaj AmruturCentre For Electronics Design And TechnologyIndian Institute Of Science, BangaloreJune 2012Copyright © 2012 IIScAll Rights Reserved

ContentsTable of ContentsList of FiguresAbstractiiivvii1 Introduction 11.1 Energy Constrained Applications . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Low-power Digital Signal Processor and Micro-controller Units . . 21.1.2 Wireless Micro-sensor Networks . . . . . . . . . . . . . . . . . . . 21.1.3 Biomedical Applications . . . . . . . . . . . . . . . . . . . . . . . 21.1.4 Radio Frequency Identification (RFID) . . . . . . . . . . . . . . . 21.2 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.1 Battery Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.2 Energy Harvesting . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Background 52.1 Modeling for Sub-threshold operation . . . . . . . . . . . . . . . . . . . . 62.2 Challenges in Sub-threshold operation . . . . . . . . . . . . . . . . . . . 73 Modifications in Standard Cell Library 113.1 Transmission gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Standard transmission gate CMOS Flip-flop . . . . . . . . . . . . . . . . 123.3 Clocked Inverter CMOS Flip-flop . . . . . . . . . . . . . . . . . . . . . . 143.4 Modification in Standard Cell Library . . . . . . . . . . . . . . . . . . . . 14iii

CONTENTSiv4 Flip-Flop Characterization 174.1 Requirements on the library . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Layout Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.1 Layout Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.2 Clocked Inverter Flip-flop Layout . . . . . . . . . . . . . . . . . . 204.3 Generation of LEF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3.1 Abstract view of Clocked Inverter Flip-flop . . . . . . . . . . . . . 214.4 Behavioral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4.1 Timing model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4.2 Power model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.4.2.1 Leakage Power . . . . . . . . . . . . . . . . . . . . . . . 284.4.2.2 Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . 285 Simulations and Results 335.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.2 Process Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.4 Comparison of results with Normal library . . . . . . . . . . . . . . . . . 385.5 The openMSP430 Micro-controller . . . . . . . . . . . . . . . . . . . . . . 406 Conclusions And Scope for Future Work 416.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.2 Scope for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42A Place And Route using SoC Encounter 43A.1 Input Requirements of SoC Encounter . . . . . . . . . . . . . . . . . . . 44A.2 SoC Encounter Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44B LEF File Generation Using Abstract Generator 53

List of Figures2.1 Early measurement of the I D (V GS ) characteristics of a P-channel metalgateMOS transistor (Cleaned-up plot fom [6]) . . . . . . . . . . . . . . . 52.2 NMOS transistor current contribution in sub-threshold. (a) Sub-thresholdcurrent. (b) Gate current. (c) Junction leakage current. . . . . . . . . . . 62.3 Normalized FO4 delay vs. V DD [7] . . . . . . . . . . . . . . . . . . . . . . 82.4 I ON to I OF F ratio of an Inverter . . . . . . . . . . . . . . . . . . . . . . . 92.5 Stacking Factor for I ON and I OF F for 2 and 3 stacked NMOS transistors 92.6 Delay slowdown of Stacked devices . . . . . . . . . . . . . . . . . . . . . 103.1 Pass Transistor Strong and Degraded outputs . . . . . . . . . . . . . . . 113.2 Transmission Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Leakage in transmission Gate . . . . . . . . . . . . . . . . . . . . . . . . 123.4 Standard transmission gate CMOS flip-flop . . . . . . . . . . . . . . . . . 133.5 Data Write-Back in Standard Transmission Gate Flip-flop. . . . . . . . . 133.6 Schematic Design of Clocked Inverter Flip-flop . . . . . . . . . . . . . . . 144.1 General Shape of Standard cell core cell . . . . . . . . . . . . . . . . . . 194.2 Definition of routing Pitch . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Layout of Clocked Inverter CMOS D Flip-Flop . . . . . . . . . . . . . . . 204.4 Layout of Clocked Inverter CMOS D Flip-Flop . . . . . . . . . . . . . . . 214.5 Timing definitions in standard cell library . . . . . . . . . . . . . . . . . 254.6 Timing Sense of arcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.7 Calculation of Setup Time . . . . . . . . . . . . . . . . . . . . . . . . . . 285.1 Process Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 Energy characteristics against supply voltage in SS Corner, (For ModifiedLibrary) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36v

LIST OF FIGURESvi5.3 Normalized Delay vs Supply Voltage (for modified library). . . . . . . . . 375.4 Energy characteristics against supply voltage in different process corners(for modified library). (a) Leakage Energy. (b) Total energy. . . . . . . . 375.5 Energy vs Supply Voltage . . . . . . . . . . . . . . . . . . . . . . . . . . 385.6 Supply voltage requirement vs Performance . . . . . . . . . . . . . . . . 395.7 Power Consumption vs Performance . . . . . . . . . . . . . . . . . . . . . 39A.1 SoC Encounter Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43A.2 SoC Encounter’s GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45A.3 Importing Design in SoC Encounter . . . . . . . . . . . . . . . . . . . . . 45A.4 Specifying Floorplan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46A.5 Global Net Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 47A.6 Add Core Ring Pane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47A.7 Add Stripe Ring Pane . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48A.8 SROUTE Pane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48A.9 A view of Core after SROUTE . . . . . . . . . . . . . . . . . . . . . . . . 49A.10 Placement Mode Setting Pane . . . . . . . . . . . . . . . . . . . . . . . . 50A.11 Signal Routing Pane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50A.12 Final View of Core after Placement and Routing . . . . . . . . . . . . . . 51B.1 Defining routing layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54B.2 Defining VIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54B.3 Defining Pin Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

AbstractAlthough energy dissipation has improved with each technology node, the energy expendedper operation has become a critical consideration in digital circuits. In this thesis,the focus is on the implementation of ASIC designs that can operate in sub-thresholdvoltage so that the minimum energy operating point can be achieved.Some variation aware modifications are made in standard cell library based on the requirementof sub-threshold operation ,like delay slow-down, leakage current and unwanted cellsare removed from the library. Flip-flop is a critical and mostly used element of digitaldesigns. Standard design of flip-flop using transmission gates is changed into clockedinverterflip-flop so that it can operate in sub-threshold region. Using this modifiedlibrary, CORDIC algorithm in rotation mode is synthesized and compared over a widerange of supply voltage, from 270mV to 1.2V, with the design synthesized through normallibrary.vii

viii

Chapter 1IntroductionA recent explosion in the applications that benefit from low energy operation has cravedout a significant niche for sub-threshold circuits. Digital circuits operating in subthresholduse a supply voltage that is less than the threshold voltages of transistors.In this region of operation, circuits consume less energy for active operation and dissipateless leakage power than higher voltage alternatives, but they operate more slowly.Until the emphasis on maximizing operational frequency in digital circuits dominated tothe point that sub-threshold operation received very little attention. But as the demandof energy constrained application is increasing, sub-threshold circuits as gaining moreattention.1.1 Energy Constrained ApplicationsEnergy Consumptions is a key metric for large number of emerging set of applications.These Energy constrained applications generally have low activity rates and low speedrequirements, but the system is required to have long battery lifetime, typically morethan 5 years. Ideally the power consumption of these systems will decrease to the pointthat they can harvest energy from their environments and have theoretically unlimitedlifetime.1

1.4. Thesis Organization 4harvesting which is possible only when the average power consumption is sufficiently low.Sub-threshold circuit design provides the solution to keep the power consumption lowenough so that the circuit can operate at the voltage level that minimizes the energyconsumption.CORDIC algorithm is a basic block of any DSP unit. It performs the trigonometricfunctions without any multiplication, by just doing shift and add. CORDIC algorithmis also used in communication systems to generate quadrature components of signal.Since this work is focusing on the low power DSP and MCU wireless communications forbiomedical applications, so a design of CORDIC algorithm is implemented in rotationmode and demonstrated its minimum energy point at 0.13µm UMC technology.1.4 Thesis OrganizationThis thesis is organized as follows:• Chapter 2 gives a brief description of sub-threshold circuits and challenges in designingsub-threshold circuits.• Chapter 3 describes about the problems in standard cells when operating in subthresholdregime. In this chapter we have discussed about the design of clockedinverterCMOS flip-flop and the modifications made in standard cell library to makeit reliable for sub-threshold circuits.• Chapter 4 talks about the standard cell library and the characterization process ofstandard cells.• Chapter 5 includes the process flow, i.e. basic steps taken in work and assumptionsmade. This chapter also includes the simulation results and comparison of CORDICdesigns (Using pruned and normal libraries).

Chapter 2BackgroundThe weak inversion state in a MIS (Metal-Insulator-Silicon) structure at the surface wasalready implicitly mentioned as the "parabolic region" by Garett and Brattain in theirearly paper on the MIS diode [1]. The Characterization of this particular situation wasdone by the fact that majority carriers have been repelled away from the surface, depletioncharge of fixed atoms was left behind. Minority carrier density is increased with respectto the distant bulk, but it is still negligible in the overall charge balance, and, therefore,does not affect the CV (capacitance-voltage) curve of the MIS structure. However, theseminority carriers are the only mobile charge available at the surface. Hence applicationof some voltage between the source and the drain of a MOS transistor structure, causesminority carries to move, and current flow from drain to source.Figure 2.1: Early measurement of the I D (V GS ) characteristics of a P-channel metal-gateMOS transistor (Cleaned-up plot fom [6])5

2.1. Modeling for Sub-threshold operation 6Since this current was very small (in sub-microampere level), it was ignored for years,even for rather wide transistors. This sub-threshold current was measured at very lowcurrent level, and showed the unusual exponential dependency of the drain current onthe gate voltage depicted in figure 2.1. Weak inversion then came into attention of thedigital design community under the name "sub-threshold current".2.1 Modeling for Sub-threshold operationSome part of this section is takes from [4].Considering an NMOS transistor operating in sub-threshold (i.e. V GS < V T H , where V T His the transistor threshold voltage) experiences the three current contributions as shownin Figure 2.2.(a)-(c): the sub-threshold current I ST (due to diffusion of minority carriersbetween drain and source [2]), the gate current I G (due to tunneling through dielectric)and the junction leakage I G (due to BTBT current across depletion regions) [3].Figure 2.2: NMOS transistor current contribution in sub-threshold. (a) Sub-thresholdcurrent. (b) Gate current. (c) Junction leakage current.Due to the much stronger dependence on the gate voltage, I G tends to be much lowerthan I ST at low voltages, and the same holds for I J . Hence the NMOS current at ULVis dominated by the sub-threshold contribution I STwritten in the following form [2].in figure 2.2.(a), which is usuallyI ≈ I ST = I 0WL e(V GS−V T H )/n.v t(1 − e (−V DS/v t ) ) (2.1)Considering DIBL effect it can be written as follows [7]:I = β.e (V GS)/n.v t[e λ DSV DS /nv t(1 − e (−V DS/v t) )] (2.2)Wβ = I V T H00L e− n.v t

7 2.2. Challenges in Sub-threshold operationHere I 0 is the technology dependent sub-threshold current extrapolated for V GS = V T H , v t =kTqis thermal voltage, W/L is the aspect ratio and n is the sub-threshol factor (1 +C d /C OX ) [2].The model we develop uses fitting parameters that are normalized to the characteristicinverter in the technology of interest. Equation 2.3 shows the propagation delay of acharacteristic inverter with output capacitance C g in sub-threshold.t d =K.C g .V DDI 0,g .e (V GS−V T,g )/nV th(2.3)Where K is delay fitting parameter. The expression for current in the denominator of 2.3models the ON current of the characteristic inverter, so it accounts for transition throughboth NMOS and PMOS devices.unless the PMOS and NMOS devices are perfectlysymmetrical, the terms I 0,g and V T,g are fitted parameters that do not correspond exactlywith the parameters of the same name [4]. Operational Frequency can be simply statedas:1f =(2.4)t d .L DPwhere L DP is the depth of the critical path in characteristic inverter delays. DynamicEnergy (E DY N ), Leakage Energy (E LEAK ) and total energy (E T ) per cycle are expressedas 2.5-2.8 [4], assuming rail-to-rail swing.E DY N = C eff .V 2 DD (2.5)−V T,gE LEAK =n.VW eff .I 0,g .e th .t d L DP .V DD (2.6)= W eff .K.C g .L DP .V 2 DD.e −V DDn.V th (2.7)E T = E DY N + E LEAK = V 2 DD(C eff + W eff .K.C g .L DP .e −V DDn.V th ) (2.8)Equations 2.5-2.8 extend the expression for current and delay of an inverter to an arbitrarylarger sized circuits. This extension sacrifices accuracy for simplicity. Thus C eff is theaverage total switched capacitance of the entire circuit, including the average activity2.2 Challenges in Sub-threshold operationAlthough the sub-threshold circuit design opens doors of many opportunities but it hasto face challenges also. These challenges have to be taken care to design a good ultra-lowpower circuit so that it fulfills the user requirements.

2.2. Challenges in Sub-threshold operation 8From equation 2.2, The MOS sub-threshold ON current can be given as:I ON ≈ β.e V DDn.v t (2.9)where V DD ≫ v t is assumed. This states that in sub-threshold regime, the reduction inV DD determine an exponential degradation in the delay τ D as shown in its classical CV/Iexpression in equation 2.10.τ D =C . V DDI ON 2= C 2β . V DD(2.10)e V DDn.v tFigure 2.3 depicts the same for an FO4 Inverter delay. The FO4 trend in sub-thresholdis approximately exponential, as expected from equation 2.10.Figure 2.3: Normalized FO4 delay vs. V DD [7]Another problem in sub-threshold operation is the leakage current. From equation 2.2,We can write the off current of a MOS device as follows.I OF F = βe λ DS V DDn.vt (1 − e −V DDv t ) ≈ β (2.11)Hence From equation 2.9 and 2.11, we can write:I ONI OF F= e V DD/n.v t(2.12)Which is exponentially depending on the supply and reduces as we reduce supply voltage.It means in sub-threshold OFF current of transistor becomes significantly comparable tothe ON current. Hence it has a stronger impact on power compared to super-thresholdcircuits.

9 2.2. Challenges in Sub-threshold operationFigure 2.4: I ON to I OF F ratio of an InverterIn Figure 2.4, I ON to I OF F ratio of an FO4 inverter is plotted against the supply voltage.It rolls off exponentially in sub-threshold regime as expected from equation 2.12.Another factor affecting the sub-threshold operation is stacked devices. Stacking helpsin super-threshold operation in reducing power consumption. When transistors are inseries, there overall strength is lower than that of a single transistor by a well knownstacking factor. The stacking factor for 0.13µm technology for 2 and 3 stacked devicesare plotted in the figure 2.5.Figure 2.5: Stacking Factor for I ON and I OF F for 2 and 3 stacked NMOS transistors

2.2. Challenges in Sub-threshold operation 10Due to stacking factor, current reduces flowing through the stacked transistors. In subthresholdoperation, current reduction increases delay exponentially by equation 2.10.This worsens the performance and slows down the circuit. Here is a plot for stackeddevices against their delay in Figure 2.6.Figure 2.6: Delay slowdown of Stacked devicesFrom the Figure 2.6, We see that, at 270mV, 4 stacked NMOS increases the delay by 114%as compared to 2 stacked NMOS in 0.13µm technology. It refers to dramatic degradationin performance. Similarly 4 stacked devices increases delay by 50% as compared to 3stacked devices in 0.13µm technology. In lower technology nodes, These results goesworse because delay is exponentially dependent on technology (through slope factor n),as depicted in equation 2.10.

Chapter 3Modifications in Standard Cell LibraryThis chapter includes the design of Clocked inverter CMOS Flip-flop and the Modificationsdone in standard cell library.3.1 Transmission gatesStrength of a signal is measured in terms of how closely it is to the ideal voltage source.In a design, supply voltage (VDD and ground corresponds to 1 and 0 respectively) istaken as the reference and signal strength is defined with respect to it. More close tothe supply (VDD or Ground corresponding to 1 or 0 respectively) is better strength ofsignal.Figure 3.1: Pass Transistor Strong and Degraded outputsAs shown in figure 3.1, an NMOS transistor is an almost perfect in passing a 0 whiledegrade the output while passing 1 by its threshold voltage. Similarly a PMOS transistor11

3.2. Standard transmission gate CMOS Flip-flop 12degrade the output while passing 0 by its threshold voltage and transmits 1 as it is.Thus we construct the transmission gate by adding an NOMOS device with a PMOStransistor in parallel and controlling their gates by opposite clock levels as shown infigure 3.2.Figure 3.2: Transmission GateAs we have discussed in section 2.2 that I ON to I OF F ratio of transistor degrades in subthresholdregime and OFF current is significantly comparable to ON current, transmissiongates may fail to block the input and can pass a wrong value as shown in Figure 3.3.Figure 3.3: Leakage in transmission Gate3.2 Standard transmission gate CMOS Flip-flopA standard transmission gate CMOS flip-flop two level-sensitive latches, made by usingtransmission gates as shown in figure 3.4.When the clock is low, the first (master) latch output follows the D input while the second(slave) latch holds the previous value. Thus when the clock rises from 0 to 1, the masterlatch become opaque and holds the D value at the time of the clock transition.

13 3.2. Standard transmission gate CMOS Flip-flopFigure 3.4: Standard transmission gate CMOS flip-flopSince Standard transmission gate flip-flop uses transmission gate and transmission gatesare not good to operate in sub-threshold regime, we can not use these flip-flops forthe same operating voltage range. There is a risk of data write-back in these flip-flopsoperating in sub-threshold voltage as shown in figure 3.5.Figure 3.5: Data Write-Back in Standard Transmission Gate Flip-flop.In Figure 3.5, flip-flop is being operated in state S1, i.e. node A is at voltage level 1 andthis state is being hold by feedback loop in slave latch. Similarly node B is at voltagelevel 0 in state S1. But node B can be corrupted by the leakage of transmission gate andmay cross the V IH (Minimum Input High voltage) of feedback inverter in master latch.This will invert the master latch state and in the next positive clock edge wrong datamay be transmitted. This is called as "Data Write-Back".

3.3. Clocked Inverter CMOS Flip-flop 143.3 Clocked Inverter CMOS Flip-flopA clocked-CMOS-style flip-flop implementation replaces master and slave transmissiongates in conventional circuit topology by clocked inverter, thereby eliminating the risk ofdata write-back [8]. The designed Clocked inverter CMOS flip-flop is shown in figure 3.6.Figure 3.6: Schematic Design of Clocked Inverter Flip-flopAs shown in the Figure 3.6, there is no inverter used in the signal path to reduce the delay.Interruptible keepers are used to avoid write contention [8]. These keepers are up-sizedto improve state retention. Clocked Inverter CI2 has to be large in size to reduce theclock-to-Q delay.3.4 Modification in Standard Cell LibraryStandard cell library contains all the cells with different strengths. As we have discussedin section 2.2 that stacking worsens the performance in sub-threshold operation, we canconclude that cells with higher stacked devices should not be used. So we did somepruning in the library and remove the unwanted cells. The criterion of deciding unwantedcells was based on the number of stacked devices and transmission gates. We removed the4 and more stacked devices from the library. Also we removed the MUX devices becausethey use transmission gates.Instead of standard transmission gate flip-flop, we characterized the clocked inverter flipflopand used the same in place of standard flip-flop. Thus we left with only 19 gates in

15 3.4. Modification in Standard Cell Librarythe cell library. Here is the list of these cells.1. AN2 :2 input AND gate2. An3 : 3 input AND gate3. BUF : Buffer4. BUFCk : Clock Tree buffer5. INV : Inverter6. INVCK : Clock Tree Inverter7. DFF_new : Clocked-inverter flip-flop8. DFFRF : Flip-flop with Read enable signal9. DLAH : D latch10. OR2 : 2 input OR gate11. OR3 : 3 input OR gate12. ND2 : 2 input NAND gate13. ND3 : 3 input NAND gate14. NR2 : 2 input NOR gate15. NR3 : 3 input NOR gate16. Tie0 : Tie to 017. Tie1 : Tie to 118. XOR2 : 2 input XOR gate19. XNR2 : 2 input XNOR gate

3.4. Modification in Standard Cell Library 16

Chapter 4Flip-Flop CharacterizationIn theory any logic system can be built using universal cells (NAND or NOR) or usingAND, OR and NOT gates but as the complexity of circuit design grows, it becomesimpractical to design circuits by hand. Therefore, the use of automatic synthesis toolshas become mandatory. The use of synthesis and PNR tools requires the provision ofcell library. So the first step of the design is to develop such a library, or to acquire one.There are two most common properties that a layout library must posses.The first requirement is the functional completeness. Synopsys’ Design Analyzersynthesis tool require the library to contain, at minimum, six different types of cells,namely:• One type of tristate cell.• Either NOR and NAND gates or AND and OR gates.• Inverter• D flip-flop with asynchronous set and reset.• D latch with asynchronous set and reset.The second requirement pertains to the shape and sizes of cells. The shape and sizesof standard cells must be very regular. This also applies to the geometries inside the cellsparticularly those on metal layers. These requirements are intended to ensure that PNRtools would be able to lay down the routing tracks without being obstructed by metalgeometries.17

4.1. Requirements on the library 184.1 Requirements on the libraryThe development process of a standard cell library for use with Silicon Ensemble (SE)routing tools follow the following steps:1. Layout of cells.2. Creation of Synopsys synthesis and simulation libraries.3. Generation of LEF (Library Exchange Format) description of cells.The LEF file is an simplified ASCII file which contains only metal layers and other layerswhich can obstruct routing. Since shape of n-well/p-well or diffusion do not electricallyinfluence the metal track in a significant degree so these layers are avoided in LEF files.This file is used by SE tools during placement and routing process.4.2 Layout Technique4.2.1 Layout RulesFor a cell layout to be properly usable in Standard cell based design, several requirementshave to be satisfied. The standard cell design should be done so that the followingproperties should be satisfied by the standard cell.• The general shape of cell is as follows:Here, term "pins" refers to any shape in the particular layer being used for routing.• The sizes, shapes and location of all geometries in layers pertinent to routing areregularize. For example, If a metal1 signal track inside the cell is 1µm wide, allother metal1 tracks inside the cell must also be of the same width.• All power/ground pins should have the same width and should run in the samedirection i.e. all horizontal or all vertical. And these should be in the form of railat top and bottom end of cell as shown in figure 4.1.• The routing pitch should be at least line-to-via pitch, as shown in figure 4.2, wherethe closest separation satisfies design rule for metal to metal separation. Ideally

19 4.2. Layout TechniqueFigure 4.1: General Shape of Standard cell core cellFigure 4.2: Definition of routing Pitchit should be at least via-to-via pitch. This will allow the routing tool to put viawherever necessary.• All the routing layers should be defined during LEF generation so that routing toolcan decide which metal layer is to be actually used.• The number of metal layers used for internal connections within the cell should belimited. Always try to use metal1 only, so that all higher metal layers tracks arefreely available for use by routing tool.The rules discussed above are necessitated by the way SE tool perform routing. The toolperforms routing by laying down horizontal, vertical and Manhattan-style tracks. For

4.3. Generation of LEF 20each metal layer, the direction could be horizontal or vertical, but one direction is alwaystaken as preferred and other is non-preferred. For example, if for metal1, horizontaldirection is considered as preferred, then the tool automatically tries to create horizontaltracks first before restoring to creation of vertical tracks, although vertical tracks willeventually be used if deemed necessary.4.2.2 Clocked Inverter Flip-flop LayoutLayout of designed clocked-inverter flip-flop is made using 0.13µm UMC library andshown in figure 4.3. Since other cells in library are 21 pitch (21 times pitch of metal 1),the layout made is also of same height.Figure 4.3: Layout of Clocked Inverter CMOS D Flip-FlopDuring place and route, other cells will sit next to the designed flip-flop. So the layout isdone considering DRC (Design Rule Check) rules in place and route. For example, Nwellcovering the PMOS transistor should not violate DRC rule after place and route. Thefinal area of flip-flop is 9.95µm x 3.2µm as compared to the size of standard transmissiongate flip-flop with are of 8µm x 3.2µm.4.3 Generation of LEFLEF is an abstract of cell that contain only metal layers and other layers that can obstructrouting. The steps to extract LEF file out of a layout using Abstract Generator tool

21 4.3. Generation of LEFprovided by Cadence are given in Appendix B. Since the metal nomenclature providedin 0.13µm Faraday standard cell library is different as provided in 0.13µm UMClibrary, we made some changes in extracted LEF file to make it compatible with 0.13µmFaraday standard cell library.4.3.1 Abstract view of Clocked Inverter Flip-flopThe extracted abstract view of the clocked inverter flip-flop is shown in figure 4.4.Figure 4.4: Layout of Clocked Inverter CMOS D Flip-FlopFinal LEF file of clocked inverter flip-flop is given below.1 NAMESCASESENSITIVE ON ;MACRO DFF_NEW3 CLASS CORE ;FOREIGN DFF_NEW −0.12 −0.28 ;5 ORIGIN 0 . 1 2 0 . 2 8 ;SIZE 1 0 . 1 8 BY 3 . 7 6 ;7 SYMMETRY X Y ;SITE c o r e ;9 PIN DDIRECTION INPUT ;11 USE ANALOG ;PORT13 LAYER ME1 ;RECT 1 . 7 3 1 . 3 5 2 . 0 0 1 . 6 4 ;15 RECT 1 . 7 3 1 . 3 0 1 . 9 3 1 . 8 2 ;END17 END DPIN ck19 DIRECTION INPUT ;USE CLOCK ;

4.3. Generation of LEF 2221 PORTLAYER ME1 ;23 RECT 0 . 1 0 1 . 4 6 0 . 3 8 1 . 7 4 ;RECT 0 . 1 0 1 . 3 2 0 . 3 0 1 . 8 4 ;25 ENDEND ck27 PIN QDIRECTION OUTPUT ;29 PORTLAYER ME1 ;31 RECT 9 . 5 6 1 . 8 4 9 . 8 5 2 . 6 5 ;RECT 9 . 6 4 0 . 6 7 9 . 8 5 2 . 6 5 ;33 RECT 9 . 5 6 0 . 6 7 9 . 8 5 1 . 0 2 ;END35 END QPIN GND!37 DIRECTION INPUT ;USE GROUND ;39 SHAPE ABUTMENT ;PORT41 LAYER ME1 ;RECT 0 . 0 0 −0.28 9 . 9 5 0 . 2 8 ;43 RECT 9 . 0 4 0 . 6 7 9 . 3 2 1 . 0 2 ;RECT 9 . 1 2 −0.28 9 . 2 8 1 . 0 2 ;45 RECT 7 . 9 1 0 . 5 2 8 . 1 9 0 . 8 0 ;RECT 7 . 9 5 −0.28 8 . 1 4 0 . 8 0 ;47 RECT 5 . 5 9 −0.28 5 . 8 8 0 . 4 0 ;RECT 4 . 5 1 −0.28 4 . 7 4 0 . 8 0 ;49 RECT 2 . 2 2 −0.28 2 . 5 0 0 . 4 0 ;RECT 0 . 6 2 0 . 6 5 0 . 9 0 0 . 8 1 ;51 RECT 0 . 6 9 −0.28 0 . 8 4 0 . 8 1 ;END53 END GND!PIN VCC!55 DIRECTION INPUT ;USE POWER ;57 SHAPE ABUTMENT ;PORT59 LAYER ME1 ;RECT 0 . 0 0 2 . 9 2 9 . 9 5 3 . 4 8 ;61 RECT 9 . 0 4 1 . 8 4 9 . 3 2 2 . 6 5 ;RECT 9 . 0 1 2 . 8 0 9 . 2 9 3 . 4 8 ;63 RECT 9 . 0 4 1 . 8 4 9 . 2 9 3 . 4 8 ;RECT 7 . 9 5 2 . 8 0 8 . 2 2 3 . 4 8 ;65 RECT 5 . 5 9 2 . 8 0 5 . 8 8 3 . 4 8 ;RECT 4 . 4 7 2 . 8 0 4 . 7 5 3 . 4 8 ;67 RECT 2 . 2 1 2 . 8 0 2 . 4 9 3 . 4 8 ;RECT 0 . 5 8 2 . 8 0 0 . 8 6 3 . 4 8 ;69 ENDEND VCC!71 OBSLAYER ME1 ;73 RECT 0 . 1 6 0 . 6 5 0 . 3 2 1 . 1 2 ;

23 4.3. Generation of LEFRECT 0 . 1 6 0 . 9 6 0 . 9 8 1 . 1 2 ;75 RECT 0 . 8 2 1 . 4 0 1 . 0 2 1 . 6 8 ;RECT 0 . 8 2 0 . 9 6 0 . 9 8 2 . 2 9 ;77 RECT 0 . 1 0 2 . 1 3 0 . 9 8 2 . 2 9 ;RECT 0 . 1 0 2 . 0 4 0 . 3 8 2 . 4 4 ;79 RECT 1 . 6 2 0 . 8 8 1 . 9 0 1 . 1 4 ;RECT 1 . 6 2 0 . 9 8 2 . 6 0 1 . 1 4 ;81 RECT 2 . 4 4 1 . 3 2 2 . 6 6 1 . 6 0 ;RECT 1 . 6 5 2 . 0 2 1 . 9 3 2 . 3 1 ;83 RECT 2 . 4 4 0 . 9 8 2 . 6 0 2 . 3 1 ;RECT 1 . 6 5 2 . 1 5 2 . 6 0 2 . 3 1 ;85 RECT 1 . 5 2 0 . 4 4 2 . 0 6 0 . 6 0 ;RECT 1 . 9 0 0 . 5 6 2 . 9 8 0 . 7 2 ;87 RECT 2 . 8 3 0 . 5 6 2 . 9 8 1 . 2 6 ;RECT 2 . 8 3 1 . 1 0 3 . 4 2 1 . 2 6 ;89 RECT 3 . 1 0 1 . 1 0 3 . 4 2 1 . 4 2 ;RECT 3 . 3 4 0 . 5 6 4 . 3 3 0 . 7 2 ;91 RECT 3 . 3 4 0 . 5 6 3 . 5 4 0 . 8 4 ;RECT 4 . 1 7 0 . 5 6 4 . 3 3 1 . 5 5 ;93 RECT 4 . 1 7 1 . 3 9 4 . 8 9 1 . 5 5 ;RECT 4 . 6 7 1 . 3 9 4 . 8 9 1 . 6 7 ;95 RECT 4 . 6 7 1 . 3 9 4 . 8 3 2 . 2 4 ;RECT 3 . 3 5 2 . 0 8 4 . 8 3 2 . 2 4 ;97 RECT 3 . 3 5 2 . 0 2 3 . 6 3 2 . 3 0 ;RECT 5 . 0 8 0 . 4 7 5 . 4 3 0 . 6 2 ;99 RECT 5 . 0 5 0 . 8 0 5 . 3 3 1 . 0 8 ;RECT 5 . 0 8 0 . 4 7 5 . 2 4 2 . 2 7 ;101 RECT 5 . 0 0 1 . 8 9 5 . 2 8 2 . 2 7 ;RECT 3 . 7 3 0 . 8 8 4 . 0 1 1 . 0 4 ;103 RECT 5 . 7 6 1 . 1 2 6 . 7 9 1 . 2 8 ;RECT 3 . 7 5 0 . 8 8 3 . 9 2 1 . 8 5 ;105 RECT 2 . 8 4 1 . 6 9 3 . 9 2 1 . 8 5 ;RECT 1 . 2 0 0 . 6 5 1 . 3 6 2 . 6 4 ;107 RECT 1 . 2 0 2 . 0 4 1 . 4 2 2 . 6 4 ;RECT 2 . 8 4 1 . 6 9 3 . 0 0 2 . 6 4 ;109 RECT 5 . 7 6 1 . 1 2 5 . 9 2 2 . 6 4 ;RECT 1 . 2 0 2 . 4 8 5 . 9 2 2 . 6 4 ;111 RECT 7 . 0 8 0 . 8 8 7 . 3 6 1 . 0 4 ;RECT 7 . 1 0 0 . 8 8 7 . 2 6 1 . 8 5 ;113 RECT 6 . 5 1 1 . 6 9 7 . 2 6 1 . 8 5 ;RECT 6 . 5 1 1 . 5 9 6 . 8 3 1 . 9 1 ;115 RECT 6 . 7 0 0 . 5 6 7 . 6 8 0 . 7 2 ;RECT 6 . 7 5 0 . 5 6 6 . 9 2 0 . 8 4 ;117 RECT 7 . 5 3 0 . 5 6 7 . 6 8 1 . 3 8 ;RECT 7 . 5 3 1 . 2 2 8 . 2 7 1 . 3 8 ;119 RECT 8 . 1 1 1 . 3 3 8 . 3 5 1 . 6 1 ;RECT 8 . 1 1 1 . 2 2 8 . 2 7 2 . 3 6 ;121 RECT 6 . 7 0 2 . 2 0 8 . 2 7 2 . 3 6 ;RECT 8 . 5 4 0 . 4 7 8 . 8 8 0 . 6 2 ;123 RECT 8 . 5 1 0 . 8 0 8 . 7 9 1 . 0 8 ;RECT 8 . 5 4 0 . 4 7 8 . 7 1 2 . 4 6 ;125 RECT 8 . 4 6 1 . 8 9 8 . 7 4 2 . 4 6 ;LAYER VI1 ;

4.4. Behavioral Model 24127 RECT 3 . 1 6 1 . 1 6 3 . 3 6 1 . 3 6 ;RECT 6 . 5 8 1 . 6 5 6 . 7 8 1 . 8 5 ;129 LAYER ME2 ;RECT 3 . 1 0 1 . 1 6 5 . 7 5 1 . 3 6 ;131 RECT 3 . 1 0 1 . 1 0 3 . 4 2 1 . 4 2 ;RECT 5 . 5 4 1 . 1 6 5 . 7 5 1 . 8 5 ;133 RECT 5 . 5 4 1 . 6 6 6 . 8 3 1 . 8 5 ;RECT 6 . 5 1 1 . 5 9 6 . 8 3 1 . 9 1 ;135 ENDEND DFF_NEW137END LIBRARY4.4 Behavioral Model4.4.1 Timing modelIn general, timing model can be expressed in simple mathematical model as follows:T otalcelldelay = Intrinsicdelay + T ransitionDelay + Slopedelay (4.1)The intrinsic delay of a cell is defined as the propagation delay of the cell withoutdriving load, while it is being driven by another identical loadless cell.The transition delay of a cell is that additional delay to intrinsic delay of a cell drivinga capacitive load and is driven by another identical loadless cell.The slope delay of a cell is defined as that extra delay (in addition to intrinsic andpossibly transition delay) when driven by the identical cell with transition delay.In practice, we worry about the total delay of a cell. This total delay is the delay exhibitby the cell driving a capacitive load and driven by an identical cell with transition delay.This total delay is called Propagation delay. The delay has to be defined with respectto some measurement points on the switching waveform. In standard cell library, suchpoints are defined using the following four variables:#Threshold point of an input falling edge:input_threshold_pct_fall : 50.0;#Threshold point of an input rising edge:input_threshold_pct_rise : 50.0;#Threshold point of an output falling edge:output_threshold_pct_fall : 50.0;

25 4.4. Behavioral Model#Threshold point of an output rising edge:output_threshold_pct_rise : 50.0;Typically 50% threshold is used for most standard cell libraries. The propagation delaycan be represented as:1. Output fall delay (T f ) : For example, the output of an inverter will fall if the inputis rising.2. Output rise delay (T r ) : For example, the output of an inverter will rise if the inputis falling.In practice, these two values are different and are defined separately in the library. Figure4.5 shows the definitions of these variables.Figure 4.5: Timing definitions in standard cell libraryIn timing models slew rate of waveform also plays a very important role. In practicallife, a cell is always driven by another cell which is having some slew rate (transitiondelay). In terms of exact calculation of propagation delay we should know the outputtransition delay of preceded cell. in standard cell library slew threshold setting is definedby following four variables, as shown in Figure 4.5:

4.4. Behavioral Model 26#Falling edge threshold:slew_lower_threshold_pct_fall : 10.0;slew_upper_threshold_pct_fall : 90.0;#Rising edge threshold:slew_lower_threshold_pct_rise : 10.0;slew_upper_threshold_pct_rise : 90.0;Each combinational cell has timing arcs from each of its input to the output. For sequentialcells like flip-flop the timing arcs are defined for clock pin to Q pin. Each timingarc has timing sense, that is, how the output changes for different types of transitions ofinput.Figure 4.6: Timing Sense of arcsPositive unate timing arc means the output transition is same as the input transition.Negative unate timing arc is one which causes opposite output transition to the inputtransition. In a Non-unate timing arc, output transition can not be determined solelyby the direction of an input but also depends on the state of other inputs. An exampleof timing arcs is given in Figure 4.6.Delay of a cell is defined in terms of the output load capacitance and the input transition.In non-linear delay model, which is mostly used by standard cell libraries, delay values aregiven in a table format for different values of input transition and output load capacitance.

27 4.4. Behavioral ModelThese values of variables are discrete. If the table lookup does not match with any ofthe variable than two-dimensional interpolation is utilized to provide the resulting timingvalue. For example, let the two index_1 values (total output capacitance) are denotedas x 1 and x 2 , the two index_2 values (Input transition) are denoted as y 1 and y 2 and thecorresponding delay values are denoted as T 11 , T 12 , T 21 and T 22 . Now if the delay valueis required at (x 0 , y 0 ), then the lookup value T 00 can be given by interpolation as:T 00 = x 20 ∗ y 20 ∗ T 11 + x 20 ∗ y 01 ∗ T 12 + x 01 ∗ y 20 ∗ T 21 + x 01 ∗ y 01 ∗ T 22 (4.2)Wherex 01 = (x 0 − x 1 )/(x 2 − x 1 )x 20 = (x 2 − x 0 )/(x 2 − x 1 )y 01 = (y 0 − y 1 )/(y 2 − y 1 )y 20 = (y 2 − y 0 )/(y 2 − y 1 )For Sequential circuits, same timing models are used with some additional constrainedmodels. These constrained models are used for setup and hold time definition. Setuptime, for a positive edge triggered flip-flop, is defined as the data arrival time before thepositive edge of clock that gives the clock-to-Q delay degradation of 10% with respect tothe clock-to-Q delay at a very large data arrival time before the positive edge of clock asshown in Figure 4.7. Similarly hold time, for a positive edge triggered flip-flop, is definedas the change of data time after the positive edge of clock that gives us the clock-to-Qdelay degradation of 10% with respect to the clock-to-Q delay when the data changesafter a long time of positive edge of clock.4.4.2 Power modelPower dissipation in CMOS circuits comes from two components:• Dynamic Power: This is caused by charging and discharging of load capacitanceand due to the short circuit current when NMOS and PMOS, both, are ON.• Leakage Power: The source of leakage power dissipation is sub-threshold leakagecurrent, gate leakage and junction leakage.

4.4. Behavioral Model 28Figure 4.7: Calculation of Setup Time4.4.2.1 Leakage PowerFor a fixed supply voltage, gate leakage is approximately fixed and is very less comparedto sub-threshold leakage. Junction leakage is also very less compared to sub-thresholdleakage. Sub-threshold leakage depends on the input combination of the cell. For example,For 2 input NAND gate, leakage will be least when both inputs are at logic level 0and will be most when both the inputs are at logic level 1. Thereby for exact leakagepower calculation, we should characterize the cell for all the input combinations.4.4.2.2 Dynamic PowerDynamic power arises because of switching of the load. This depends on the output loadcapacitance and input transition time. In standard cell library, dynamic power is definedin table lookup form for different values of input transition and output load capacitance.Same methodology is used to calculate power for in-between point of the variable.The characterized Clocked Inverter flip-flop has the resulting data in the form of followinglines.c e l l (DFF_new) {2 area : 3 1 . 8 4 ;c e l l _ f o o t p r i n t : "QDFF" ;4 f f ( IQ , IQN) {next_state : "D" ;6 clocked_on : "CK" ;}

29 4.4. Behavioral Model8 cell_leakage_power : 17295.68 ;leakage_power ( ) {10 when : " !D␣∗␣ !CK" ;value : 1 6 6 8 3 . 8 1 ;12 }leakage_power ( ) {14 when : " !D␣∗␣CK" ;value : 1 8 5 1 1 . 5 6 ;16 }leakage_power ( ) {18 when : "D␣∗␣ !CK" ;value : 1 7 3 6 4 . 3 3 ;20 }leakage_power ( ) {22 when : "D␣∗␣CK" ;value : 1 6 6 2 3 . 0 4 ;24 }pin (Q) {26 function : "IQ" ;d i r e c t i o n : output ;28 max_capacitance : 0 . 1 2 5 5 9 3 ;internal_power ( ) {30 r e l a t e d _ p i n : "CK" ;power (POWER_7x7) {32 index_1 ( " 0 . 0 3 6 2 9 8 , 0 . 1 0 3 5 3 0 , 0 . 1 8 6 1 2 2 , 0 . 3 5 2 3 9 9 , 0 . 6 8 5 8 5 6 , 1 . 2 9 7 6 4 6 , 2 . 6 8 8 4 3 5 " ) ;index_2 ( " 0 . 0 0 1 5 0 0 , 0 . 0 0 3 3 0 6 , 0 . 0 0 7 2 8 7 , 0 . 0 1 6 0 6 2 , 0 . 0 3 5 4 0 4 , 0 . 0 7 8 0 3 5 , 0 . 1 7 2 0 0 0 " ) ;34 v a l u e s ( " 0 . 0 1 4 8 8 , 0 . 0 1 5 2 3 5 , 0 . 0 1 5 5 0 6 , 0 . 0 1 5 6 3 0 , 0 . 0 1 5 6 9 8 , 0 . 0 1 5 7 2 3 , 0 . 0 1 5 6 9 1 " ,\" 0 . 0 1 4 8 7 0 , 0 . 0 1 5 2 0 4 , 0 . 0 1 5 4 8 5 , 0 . 0 1 5 6 1 0 , 0 . 0 1 5 6 7 8 , 0 . 0 1 5 7 0 5 , 0 . 0 1 5 6 7 4 " ,\36 " 0 . 0 1 4 9 1 1 , 0 . 0 1 5 2 4 0 , 0 . 0 1 5 5 1 9 , 0 . 0 1 5 6 5 2 , 0 . 0 1 5 7 1 7 , 0 . 0 1 5 7 4 4 , 0 . 0 1 5 7 1 3 " ,\" 0 . 0 1 5 0 8 5 , 0 . 0 1 5 4 1 8 , 0 . 0 1 5 7 0 9 , 0 . 0 1 5 8 3 3 , 0 . 0 1 5 8 9 9 , 0 . 0 1 5 9 2 5 , 0 . 0 1 5 8 9 3 " ,\38 " 0 . 0 1 6 0 8 4 , 0 . 0 1 6 0 8 8 , 0 . 0 1 6 3 0 1 , 0 . 0 1 6 3 9 2 , 0 . 0 1 6 4 5 5 , 0 . 0 1 6 4 7 7 , 0 . 0 1 6 4 4 6 " ,\" 0 . 0 1 7 6 5 3 , 0 . 0 1 7 6 4 8 , 0 . 0 1 7 6 6 7 , 0 . 0 1 7 7 0 3 , 0 . 0 1 7 7 4 1 , 0 . 0 1 7 7 5 8 , 0 . 0 1 7 7 2 5 " ,\40 " 0 . 0 2 1 0 0 1 , 0 . 0 2 0 9 9 5 , 0 . 0 2 1 0 1 2 , 0 . 0 2 1 0 5 6 , 0 . 0 2 1 0 7 2 , 0 . 0 2 1 0 7 7 , 0 . 0 2 1 0 3 9 " ) ;}42 }timing ( ) {44 r e l a t e d _ p i n : "CK" ;timing_type : r i s i n g _ e d g e ;46 timing_sense : non_unate ;c e l l _ r i s e (DELAY_7x7) {48 index_1 ( " 0 . 0 3 6 2 9 8 , 0 . 1 0 3 5 3 0 , 0 . 1 8 6 1 2 2 , 0 . 3 5 2 3 9 9 , 0 . 6 8 5 8 5 5 , 1 . 2 9 7 6 5 1 , 2 . 6 8 8 4 3 6 " ) ;index_2 ( " 0 . 0 0 1 5 0 0 , 0 . 0 0 3 3 0 6 , 0 . 0 0 7 2 8 7 , 0 . 0 1 6 0 6 2 , 0 . 0 3 5 4 0 4 , 0 . 0 7 8 0 3 5 , 0 . 1 7 2 0 0 0 " ) ;50 v a l u e s ( " 0 . 2 1 6 2 3 6 , 0 . 2 3 0 0 6 6 , 0 . 2 5 9 1 3 1 , 0 . 3 2 2 2 9 2 , 0 . 4 6 0 5 1 1 , 0 . 7 6 4 1 0 5 , 1 . 4 3 2 7 9 6 " ,\" 0 . 2 3 2 9 5 2 , 0 . 2 4 6 7 8 2 , 0 . 2 7 5 8 4 7 , 0 . 3 3 9 0 0 7 , 0 . 4 7 7 2 2 5 , 0 . 7 8 0 8 2 1 , 1 . 4 4 9 5 1 0 " ,\52 " 0 . 2 4 7 3 9 2 , 0 . 2 6 1 2 2 2 , 0 . 2 9 0 2 8 8 , 0 . 3 5 3 4 4 7 , 0 . 4 9 1 6 6 6 , 0 . 7 9 5 2 5 9 , 1 . 4 6 3 9 5 2 " ,\" 0 . 2 6 7 1 9 0 , 0 . 2 8 1 0 2 0 , 0 . 3 1 0 0 8 4 , 0 . 3 7 3 2 4 5 , 0 . 5 1 1 4 6 5 , 0 . 8 1 5 0 5 2 , 1 . 4 8 3 7 4 7 " ,\54 " 0 . 2 9 1 8 0 2 , 0 . 3 0 5 6 3 1 , 0 . 3 3 4 6 9 5 , 0 . 3 9 7 8 5 1 , 0 . 5 3 6 0 7 0 , 0 . 8 3 9 6 7 4 , 1 . 5 0 8 3 6 0 " ,\" 0 . 3 1 6 9 1 4 , 0 . 3 3 0 7 4 3 , 0 . 3 5 9 8 0 4 , 0 . 4 2 2 9 5 6 , 0 . 5 6 1 1 8 1 , 0 . 8 6 4 7 8 0 , 1 . 5 3 3 4 7 5 " ,\56 " 0 . 3 3 7 9 7 7 , 0 . 3 5 1 8 1 8 , 0 . 3 8 0 8 7 0 , 0 . 4 4 4 0 1 2 , 0 . 5 8 2 2 3 5 , 0 . 8 8 5 8 3 8 , 1 . 5 5 4 5 4 6 " ) ;}58 r i s e _ t r a n s i t i o n (DELAY_7x7) {index_1 ( " 0 . 0 3 6 2 9 8 , 0 . 1 0 3 5 3 0 , 0 . 1 8 6 1 2 2 , 0 . 3 5 2 3 9 9 , 0 . 6 8 5 8 5 5 , 1 . 2 9 7 6 5 1 , 2 . 6 8 8 4 3 6 " ) ;60 index_2 ( " 0 . 0 0 1 5 0 0 , 0 . 0 0 3 3 0 6 , 0 . 0 0 7 2 8 7 , 0 . 0 1 6 0 6 2 , 0 . 0 3 5 4 0 4 , 0 . 0 7 8 0 3 5 , 0 . 1 7 2 0 0 0 " ) ;

4.4. Behavioral Model 30v a l u e s ( " 0 . 0 6 1 8 1 1 , 0 . 0 8 8 6 9 4 , 0 . 1 5 0 3 0 9 , 0 . 2 8 9 7 7 2 , 0 . 6 0 1 0 1 3 , 1 . 2 8 8 6 4 7 , 2 . 8 0 4 9 7 9 " ,\62 " 0 . 0 6 1 8 0 6 , 0 . 0 8 8 6 9 0 , 0 . 1 5 0 3 2 3 , 0 . 2 8 9 8 1 6 , 0 . 6 0 1 0 1 7 , 1 . 2 8 8 6 3 0 , 2 . 8 0 4 9 7 9 " ,\" 0 . 0 6 1 8 1 2 , 0 . 0 8 8 6 8 5 , 0 . 1 5 0 3 3 0 , 0 . 2 8 9 8 6 5 , 0 . 6 0 1 0 0 2 , 1 . 2 8 8 6 2 8 , 2 . 8 0 4 9 7 3 " ,\64 " 0 . 0 6 1 8 1 2 , 0 . 0 8 8 6 9 1 , 0 . 1 5 0 3 0 3 , 0 . 2 8 9 8 6 9 , 0 . 6 0 0 9 7 1 , 1 . 2 8 8 6 3 7 , 2 . 8 0 4 9 7 8 " ,\" 0 . 0 6 1 8 0 3 , 0 . 0 8 8 6 8 0 , 0 . 1 5 0 3 2 8 , 0 . 2 8 9 8 5 3 , 0 . 6 0 1 0 6 1 , 1 . 2 8 8 6 3 9 , 2 . 8 0 4 9 7 9 " ,\66 " 0 . 0 6 1 8 3 8 , 0 . 0 8 8 7 1 5 , 0 . 1 5 0 3 4 2 , 0 . 2 8 9 9 1 3 , 0 . 6 0 1 0 1 2 , 1 . 2 8 8 6 3 8 , 2 . 8 0 4 9 7 9 " ,\" 0 . 0 6 2 1 1 3 , 0 . 0 8 8 9 3 3 , 0 . 1 5 0 4 5 2 , 0 . 2 8 9 9 2 4 , 0 . 6 0 0 9 9 2 , 1 . 2 8 8 6 4 4 , 2 . 8 0 4 9 8 3 " ) ;68 }c e l l _ f a l l (DELAY_7x7) {70 index_1 ( " 0 . 0 3 6 2 9 8 , 0 . 1 0 3 5 3 0 , 0 . 1 8 6 1 2 2 , 0 . 3 5 2 3 9 9 , 0 . 6 8 5 8 5 7 , 1 . 2 9 7 6 4 0 , 2 . 6 8 8 4 3 5 " ) ;index_2 ( " 0 . 0 0 1 5 0 0 , 0 . 0 0 3 3 0 6 , 0 . 0 0 7 2 8 7 , 0 . 0 1 6 0 6 2 , 0 . 0 3 5 4 0 4 , 0 . 0 7 8 0 3 5 , 0 . 1 7 2 0 0 0 " ) ;72 v a l u e s ( " 0 . 2 2 5 2 3 0 , 0 . 2 3 4 6 9 1 , 0 . 2 5 1 9 8 3 , 0 . 2 8 4 2 3 6 , 0 . 3 4 7 7 8 6 , 0 . 4 8 2 7 8 4 , 0 . 7 7 9 5 3 1 " ,\" 0 . 2 4 2 2 0 3 , 0 . 2 5 1 6 6 5 , 0 . 2 6 8 9 5 6 , 0 . 3 0 1 2 0 9 , 0 . 3 6 4 7 5 9 , 0 . 4 9 9 7 5 5 , 0 . 7 9 6 5 0 1 " ,\74 " 0 . 2 5 6 9 3 9 , 0 . 2 6 6 3 9 9 , 0 . 2 8 3 6 9 2 , 0 . 3 1 5 9 4 4 , 0 . 3 7 9 4 9 4 , 0 . 5 1 4 4 9 1 , 0 . 8 1 1 2 3 7 " ,\" 0 . 2 7 7 2 0 3 , 0 . 2 8 6 6 6 3 , 0 . 3 0 3 9 5 5 , 0 . 3 3 6 2 0 7 , 0 . 3 9 9 7 5 7 , 0 . 5 3 4 7 5 6 , 0 . 8 3 1 5 0 1 " ,\76 " 0 . 3 0 2 3 8 6 , 0 . 3 1 1 8 4 7 , 0 . 3 2 9 1 3 8 , 0 . 3 6 1 3 9 0 , 0 . 4 2 4 9 4 0 , 0 . 5 5 9 9 3 8 , 0 . 8 5 6 6 8 6 " ,\" 0 . 3 2 7 6 7 8 , 0 . 3 3 7 1 3 8 , 0 . 3 5 4 4 2 9 , 0 . 3 8 6 6 8 1 , 0 . 4 5 0 2 3 0 , 0 . 5 8 5 2 2 8 , 0 . 8 8 1 9 7 6 " ,\78 " 0 . 3 4 6 8 9 4 , 0 . 3 5 6 3 5 3 , 0 . 3 7 3 6 4 3 , 0 . 4 0 5 8 9 3 , 0 . 4 6 9 4 4 2 , 0 . 6 0 4 4 4 0 , 0 . 9 0 1 1 8 5 " ) ;}80 f a l l _ t r a n s i t i o n (DELAY_7x7) {index_1 ( " 0 . 0 3 6 2 9 8 , 0 . 1 0 3 5 3 0 , 0 . 1 8 6 1 2 2 , 0 . 3 5 2 3 9 9 , 0 . 6 8 5 8 5 7 , 1 . 2 9 7 6 4 0 , 2 . 6 8 8 4 3 5 " ) ;82 index_2 ( " 0 . 0 0 1 5 0 0 , 0 . 0 0 3 3 0 6 , 0 . 0 0 7 2 8 7 , 0 . 0 1 6 0 6 2 , 0 . 0 3 5 4 0 4 , 0 . 0 7 8 0 3 5 , 0 . 1 7 2 0 0 0 " ) ;v a l u e s ( " 0 . 0 3 9 3 4 4 , 0 . 0 5 1 3 7 7 , 0 . 0 7 6 1 2 1 , 0 . 1 2 8 4 5 3 , 0 . 2 4 4 8 0 8 , 0 . 5 0 7 6 0 3 , 1 . 0 9 5 4 2 4 " ,\84 " 0 . 0 3 9 3 4 1 , 0 . 0 5 1 3 8 0 , 0 . 0 7 6 1 1 4 , 0 . 1 2 8 4 4 4 , 0 . 2 4 4 8 1 0 , 0 . 5 0 7 6 1 7 , 1 . 0 9 5 4 1 6 " ,\" 0 . 0 3 9 3 3 9 , 0 . 0 5 1 3 7 8 , 0 . 0 7 6 1 1 7 , 0 . 1 2 8 4 4 9 , 0 . 2 4 4 8 1 4 , 0 . 5 0 7 6 1 9 , 1 . 0 9 5 4 2 2 " ,\86 " 0 . 0 3 9 3 3 9 , 0 . 0 5 1 3 7 3 , 0 . 0 7 6 1 1 4 , 0 . 1 2 8 4 4 6 , 0 . 2 4 4 7 9 0 , 0 . 5 0 7 5 8 1 , 1 . 0 9 5 4 3 2 " ,\" 0 . 0 3 9 3 3 6 , 0 . 0 5 1 3 7 1 , 0 . 0 7 6 1 1 6 , 0 . 1 2 8 4 5 0 , 0 . 2 4 4 8 1 1 , 0 . 5 0 7 6 3 1 , 1 . 0 9 5 4 3 3 " ,\88 " 0 . 0 3 9 3 3 3 , 0 . 0 5 1 3 7 0 , 0 . 0 7 6 1 0 9 , 0 . 1 2 8 4 3 8 , 0 . 2 4 4 7 9 3 , 0 . 5 0 7 5 8 4 , 1 . 0 9 5 4 2 5 " ,\" 0 . 0 3 9 3 2 8 , 0 . 0 5 1 3 6 6 , 0 . 0 7 6 1 0 3 , 0 . 1 2 8 4 2 7 , 0 . 2 4 4 7 4 9 , 0 . 5 0 7 5 5 1 , 1 . 0 9 5 4 2 1 " ) ;90 }}92 }pin (D) {94 nextstate_type : data ;d i r e c t i o n : input ;96 c a p a c i t a n c e : 0 . 0 0 1 5 4 9 ;internal_power ( ) {98 when : " !CK" ;power (POWER_7x1) {100 index_1 ( " 0 . 0 3 4 8 4 5 , 0 . 1 0 2 8 9 3 , 0 . 1 8 5 7 6 6 , 0 . 3 5 2 4 4 7 , 0 . 6 8 6 6 4 2 , 1 . 2 9 9 8 2 1 , 2 . 6 9 3 7 0 5 " ) ;v a l u e s ( " 0 . 0 0 7 6 9 7 , 0 . 0 0 7 6 9 0 , 0 . 0 0 7 7 3 8 , 0 . 0 0 7 9 0 9 , 0 . 0 0 8 3 8 3 , 0 . 0 0 9 4 4 4 , 0 . 0 1 2 2 3 2 " ) ;102 }}104 internal_power ( ) {when : "CK" ;106 power (POWER_7x1) {index_1 ( " 0 . 0 3 4 8 4 5 , 0 . 1 0 2 8 9 4 , 0 . 1 8 5 7 6 2 , 0 . 3 5 2 5 1 2 , 0 . 6 8 6 6 3 1 , 1 . 2 9 9 8 1 8 , 2 . 6 9 3 7 0 3 " ) ;108 v a l u e s ( " 0 . 0 0 2 4 9 9 , 0 . 0 0 2 4 8 7 , 0 . 0 0 2 5 3 9 , 0 . 0 0 2 7 2 0 , 0 . 0 0 3 2 1 8 , 0 . 0 0 4 3 1 2 , 0 . 0 0 7 0 8 7 " ) ;}110 }timing ( ) {112 r e l a t e d _ p i n : "CK" ;sdf_edges : both_edges ;

31 4.4. Behavioral Model114 timing_type : s e t u p _ r i s i n g ;r i s e _ c o n s t r a i n t (CONST_3x3) {116 index_1 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;index_2 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;118 v a l u e s ( " 0 . 1 1 7 5 5 0 , 0 . 0 8 3 7 2 1 , 0 . 1 1 9 8 3 0 " ,\" 0 . 2 7 3 5 9 0 , 0 . 2 3 2 3 6 0 , 0 . 2 7 3 4 0 0 " ,\120 " 0 . 3 2 9 4 2 0 , 0 . 2 8 5 7 3 0 , 0 . 3 2 9 2 3 0 " ) ;}122 f a l l _ c o n s t r a i n t (CONST_3x3) {index_1 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;124 index_2 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;v a l u e s ( " 0 . 1 3 4 8 1 0 , 0 . 0 6 1 5 2 9 , 0 . 0 7 0 5 1 3 " ,\126 " 0 . 3 7 7 1 5 0 , 0 . 2 9 6 4 7 0 , 0 . 3 0 4 2 2 0 " ,\" 0 . 5 0 9 4 3 0 , 0 . 4 3 3 6 8 0 , 0 . 4 4 2 6 6 0 " ) ;128 }}130 timing ( ) {r e l a t e d _ p i n : "CK" ;132 sdf_edges : both_edges ;timing_type : h o l d _ r i s i n g ;134 r i s e _ c o n s t r a i n t (CONST_3x3) {index_1 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;136 index_2 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;v a l u e s ( " −0.039579 , −0.020541 , −0.051718 " ,\138 " −0.161090 , −0.151920 , −0.185560 " ,\" −0.172540 , −0.188030 , −0.226600 " ) ;140 }f a l l _ c o n s t r a i n t (CONST_3x3) {142 index_1 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;index_2 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;144 v a l u e s ( " −0.029716 ,0.041104 ,0.037051 " ,\" −0.200550 , −0.146990 , −0.147340 " ,\146 " −0.268710 , −0.239810 , −0.246330 " ) ;}148 }}150 pin (CK) {d i r e c t i o n : input ;152 c a p a c i t a n c e : 0 . 0 0 1 6 2 7 ;max_transition : 2 . 6 8 0 0 0 0 ;154 clock : t r u e ;internal_power ( ) {156 power (POWER_7x1) {index_1 ( " 0 . 0 3 4 8 4 7 , 0 . 1 0 2 8 9 3 , 0 . 1 8 5 7 6 9 , 0 . 3 5 2 4 6 9 , 0 . 6 8 6 6 4 0 , 1 . 2 9 9 8 1 9 , 2 . 6 9 3 7 0 7 " ) ;158 v a l u e s ( " 0 . 0 0 5 7 7 6 , 0 . 0 0 5 5 6 1 , 0 . 0 0 5 4 9 6 , 0 . 0 0 5 6 6 4 , 0 . 0 0 5 9 9 2 , 0 . 0 0 7 0 5 6 , 0 . 0 0 9 9 7 6 " ) ;}160 }min_pulse_width_high : 0 . 1 8 4 8 0 0 ;162 min_pulse_width_low : 0 . 3 2 5 2 0 0 ;}164 }

4.4. Behavioral Model 32

Chapter 5Simulations and ResultsIn this work, CORDIC algorithm is implemented and demonstrated, in simulations, tobe operable over the wide voltage range of 270mV to 1.2V in all process corners. Thischapter talks about flow of work done and results obtained.Standard cell library is always designed and characterized at a particular voltage. 0.13µmFaraday standard core cell library is characterized at 1.2volts in TT corner with 25℃.Thereby we can calculate the power and delay of the ASIC synthesized through thisstandard cell library at that particular voltage. But we can not guarantee that thesame ASIC can operate at some other voltage level with different performance. Wecan only guarantee the ASIC to be operable at different voltage level meeting a specificperformance by doing exhaustive Spice simulation. Spice simulation of ASIC design is nota feasible solution because it takes several day to finish. In this work we have proposeda simpler approach to determine the same for CORDIC algorithm and also we havecalculated energy consumed per rotation for a wide operating voltage range of 270mV to1.2V. Thus we find out the minimum energy point of CORDIC design.5.1 Assumptions• It is assumed that the critical path of ASIC design does not change with voltage,process and temperature.• It is assumed that the leakage power of cell is independent of the input combination.The first assumption is a weak assumption because with PVT variation delay of each path33

5.2. Process Flow 34changes and thus critical path may change. The second assumption is taken becausewe assume that the all the input combinations will arrive to the cell for equal time.Thereby we take the leakage power for each cell (average of leakage power of all inputcombinations). So in this work we will stick with these assumptions.5.2 Process FlowThe full process flow of the work done is shown in Figure 5.1 and explained below.Figure 5.1: Process FlowThe process flow is defined in following steps:• The RTL code of CORDIC algorithm is first simulated and functionality is verifiedby behavioral simulation.• From the behavioral simulation an activity file is generated to calculate power of thewhole circuit using Prime power. Prime Power also need the parasitic informationthat was given in the form of .spef file format after placement and routing usingSoC Encounter.

35 5.2. Process Flow• To place and route, First standard cell library was modified as discussed in sectionsec:librarymodification and using this library, we synthesized our RTL code ofCORDIC algorithm at 200MHz clock speed.• Synthesis generates a gate level netlist, which gives us different cells used in thecomplete circuit and also number of occurrence of each cell. Using this information,average leakage power of each cell is calculated by simulating them individually incadence.• Total leakage power is calculated by adding average leakage power of all the cellsfor different supply voltage.• Timing analysis is done using Design compiler which gives us the critical path andassociated node capacitances in the critical path.• Extracted node capacitances does not include the wire capacitance. Since in subthresholdoperation wire load capacitance also play significant role, parasitic informationis extracted from the placed and routed SoC (using SoC Encounter) in theform of .spef file. This information is extracted for each node associated with thecritical path and added this parasitic capacitance to respective node.• This path is analyzed in cadence and total path delay is calculated for differentsupply voltage.• Now we have the total leakage power and critical path delay of circuit at supplyvoltage from 270mV to 1.2V. Following equations are used to calculate the powerand energy at all these voltage levels:T otalP ower(P T ) = DynamicP ower(P DY N ) + LeakageP ower(P LEAK )(5.1)= αC eff V 2 DDf + P LEAK (5.2)P T 1 = αC eff V 2 DD1f 1 + P LEAK1 (5.3)P T 2 = αC eff V 2 DD2f 2 + P LEAK2 (5.4)P DY N2 = P T 2 − P LEAK2 = V DD2 2 f 2VDD1 2 f .(P T 1 − P LEAK1 ) (5.5)1P T 2 = V DD2 2 f 2VDD1 2 f .(P T 1 − P LEAK1 ) + P LEAK2 (5.6)1LeakageEnergy(E LEAK ) = P LEAK2 .T D2 (5.7)DynamicEnergy(E DY N2 ) = P DY N2 .T D2 (5.8)

5.3. Results 365.3 ResultsCORDIC algorithm is synthesized at 200 MHz clock frequency through pruned library,as discussed in section 3.4, containing clocked inverter flip-flop instead of standard transmissiongate flip-flop. The energy consumed per rotation was plotted against the supplyvoltage as shown in figure 5.2.Figure 5.2: Energy characteristics against supply voltage in SS Corner, (For ModifiedLibrary)From the figure 5.2, we can conclude that the minimum energy point is in between 410- 440mV. Although the leakage power reduces but the leakage energy is exponentiallyincreasing as the supply voltage is reducing. This is because the delay of the criticalpath increases rapidly with reduction in supply voltage which dominates in sub-thresholdoperation. Dynamic energy is directly proportional to the square of supply voltage asdepicted in Figure 5.2.CORDIC design is simulated in SS, FF and tt corners to make sure the operation is preservedthroughout the process variation. The magnitude of threshold voltage of transistordecreases nearly linearly with temperature and may be approximated by the followingformula [3].V t (T ) = V t (T r ) − k vt (T − T r ) (5.9)where k vt is typically about 1-2 mV/K and T r is room temperature. Also with temperature,ON current of transistor decreases because transistors are velocity saturated in

37 5.3. ResultsFigure 5.3: Normalized Delay vs Supply Voltage (for modified library).super-threshold regime. This makes the transistor slower in high temperature at superthresholdvoltages. But in sub-threshold operation, i.e. below some voltage, transistorsare not velocity saturated and temperature increases the mobility of carriers and Ionincreases with temperature. Thus in sub-threshold operation devices becomes faster withtemperature. This is depicted in figure 5.3.Figure 5.4: Energy characteristics against supply voltage in different process corners (formodified library). (a) Leakage Energy. (b) Total energy.As we know that sub-threshold leakage is exponentially dependent on temperature, so in

5.4. Comparison of results with Normal library 38SS corner transistors will leak more as compared to TT and FF corner. This is shown inFigure 5.4.5.4 Comparison of results with Normal libraryIn this section, We will compare our results of pruned library with normal library, i.e.without any modification.In modified library, we have only limited number of gates. Clearly, we have to use morenumber of gates to map the exact functionality of the circuit as compared to normal (full)library. This increases the area of the circuit. Thus after pruning, our area of CORDICis increased by 31.5% as compared to CORDIC using normal library. due to this theleakage power increases. which is shown in figure 5.5.Figure 5.5: Energy vs Supply VoltageFor a particular frequency of operation, stacking increases the supply voltage demand tocompensate the delay degradation. This makes the CORDIC design with normal libraryslower than the CORDIC design with pruned library and consume less dynamic energy asshown in figure 5.5. So for the same operating frequency, it requires more supply voltageas compared to pruned library as shown in figure 5.6.Leakage power in CORDIC design with pruned library is more as compared to normallibrary because it contains less stacked devices. So for low performance requirements,

39 5.4. Comparison of results with Normal libraryFigure 5.6: Supply voltage requirement vs Performancewhere leakage power is more as compared to dynamic power, pruned library consumesmore power as shown in Figure 5.7. From the Figure 5.7, we observe that at 2MHzfrequency normal library is more power efficient than pruned library but at 250MHz, itconsumes more energy.Figure 5.7: Power Consumption vs PerformanceIn the Figure 5.7, we observe that at low performance requirement, In FF corner weconsume 3 times less energy than SS corner. On the other hand at high frequency, at 250MHz the power efficiency in FF corner is 75% more that SS corner.

5.5. The openMSP430 Micro-controller 40The comparison of CORDIC designs using normal library and modified library is summarizedin table 5.1.Table 5.1: Comparison CORDIC Designs 1t 1.2V, SS cornerParameter With Normal Library With Modified LibraryGate Count 8560 11260Maximum Frequency (MHz) 245 260Leakage Energy (pJ/rot) 0.189 0.236Dynamic Energy (pJ/rot) 11.57 11.56Total Energy (pJ/rot) 11.76 11.79Minimum Energy (pJ/rot) 420-450 410-440Minimum Energy Voltage (mV) 1.6-1.7 1.9-2.05.5 The openMSP430 Micro-controllerThe openMSP430 micro-controller is also synthesized through this pruned library andcompared against the normal library in table 5.2. From the table 5.2, after pruning,area increases around 30%. Leakage increases because of lesser stacking. Speed is a bitbetter. Thus after pruning openMSP430 also consumes more energy per rotation. In thetable 5.2, energy consumption is only for core of openMSP430 excluding memory andperipheral devices.Table 5.2: Comparison of CORDIC on openMSP430 at 1.08V, SS cornerParameter With Normal Library With Modified LibraryGate Count 10279 13389Maximum Frequency (MHz) 66.6 66.8Leakage power (mW) 0.033 0.046Dynamic power (mW) 0.616 0.637Total power (mW) 0.649 0.683Energy per rotation (nJ/rot) 1.048 1.103

Chapter 6Conclusions And Scope for FutureWork6.1 ConclusionsThis work proposes the implementation of ASIC design operable at sub-threshold voltagesso that the minimum energy point is achievable. We have addressed the issue in subthresholdcircuit design and problems with transmission gates and stacked devices. Forthis, we modified the standard cell library and designed the clocked-inverter flip-flop forthe library. We have used the design of clocked-inverter flip-flop from [8]. The proposedmethodology was applied for an CORDIC algorithm in rotation mode at clock frequencyof 200MHz and successfully able to achieve the minimum energy point, in simulation, atSS and TT process corner.From Figure 5.6, we see that the supply voltage requirement of pruned library is alwaysless than the normal library. From Figure 5.7, we conclude that for the low performancerequirement, normal library is more energy efficient. Also in FF process corner, design ismuch energy efficient all over the performance range.The minimum energy voltage for the CORDIC design was 410-440mV and the minimumenergy consumption was 1.9-2.0 pJ/rotation. The gate count of the CORDIC designusing modified library was 11260 which was approx 32% more than the CORDIC designusing normal library. Thus the CORDIC algorithm was successfully able to operate overa wide voltage range of 270mV to 1.2V.41

6.2. Scope for Future Work 426.2 Scope for Future WorkThis work is done in 0.13µm technology node. As per the earlier discussion, the secondordereffects becomes more prominent as the technology shrinks down. So it will beinteresting to see the results in lower technology nodes and can open more doors to thiswork.Our first assumption was that the critical path does not change with PVT variation.Which is a weak assumption and in further work we can find out the way to track thecritical path along with minimum energy point with scaling down the voltage.In this work we assumed that the leakage power is independent of input combinationsand, in the calculations, we took the average leakage power of all the input combinations.To get the exact leakage power we can track the activity in future.

Appendix APlace And Route using SoC EncounterSoC Encounter is a powerful place and route tool which has first encounter, Nano Route,Celtic and optimization tool as in built. It is widely used for RTL to GDSII flow andcan perform various functions as floor planning, feasibility analysis, placement, clock treesynthesis, power routing, SI (Signal integrity) aware routing and IR drop analysis [16, 13].SoC Encounter flow is shown in Figure A.1.Figure A.1: SoC Encounter Flow43

A.1. Input Requirements of SoC Encounter 44A.1 Input Requirements of SoC EncounterSoC Encounter requires following files for the RTL to GDSII flow:• Verilog Netlist: The Verilog netlist we obtain from synthesis of module to belaid out contains standard cells, functional I/O pads and their inter-connectioninformation. This file should be in synthesized Verilog (.v file) netlist format.• IO file: This file contains information of IO pads like which kind of (Input, Output,corner, with or without ESD protection etc.) and where (with offset, spacing, directionetc.) is to be placed. This file should be in proper format with .io extension.• Timing Constraint Files: Just as for synthesis, we need to specify timing constraintsfor the backend design with SoC-Encounter. The file can be generated fromsynthesis tool (Design Compiler) and should be in .sdc format.• Technology files: these files describe the technology itself as well as libraries ofstandard building block simple mented in this technology, i.e. standard cells, pads,RAM/ROM.– header.lef: Base technology description, defines metal layers, vias, spacingrules, routing.– Fscoh–.lef: Physical description, shape and allowed orientation of cells, layerand shape of pins, blockages, antenna information.– *fast*.lib, *slow*.lib: Functional description, timing and power information,maximum load/fanout or transition time allowed.A.2 SoC Encounter Flow• Initialization: Since Encounter creates a lot of files, Always create a new workdirectory before initializing it. Change your current directory to that. Then SoCEncounter can be started with Figure A.2encounterDo not add the "&" at the end of this command. Encounter uses the terminal tolaunch it and to provide it feedback and results for user’s commands and actions.Commands can also be given in the terminal for almost all functions that can beperformed in GUI.

45 A.2. SoC Encounter FlowFigure A.2: SoC Encounter’s GUI• Importing Design: To import the design with required libraries, goto:Design ⇒ Import Design And Fill the required entries appropriately as shownFigure A.3: Importing Design in SoC Encounterin Figure A.3. Then gotoAdvanced ⇒ powerAnd change the power and groud nets according to the library definition. This canbe found in fscoh.lib file. For example here it is VCC and GND respectively.If everything went properly then initial floorplan and memories become visible. The

A.2. SoC Encounter Flow 46design hierarchy can be viewed by choosingTools ⇒ Design BrowserOne can verify here library files also.• Floorplanning: In this step we can define the dimensions of ASIC core, arrangementof the core row, distance between ASIC core and IOs, Physical location ofany hard macro and distance between these blocks and core rows. To specify thesegotofloorplan ⇒ Specify FloorplanDuring clock tree synthesis, clock buffers are added into the design and due to optimizationprocess some area left vacate in the design that is to be filled by fillers.So in practice Core Utilization is kept at 0.7 i.e. 30% area is left for clock buffersand fillers. This setting is shown in Figure A.4.Figure A.4: Specifying FloorplanIt is necessary to specify the global net connections i.e. VCC and GND connectionsare to be specified. This is done byFloorplan ⇒ Connect Global NetsSelect Pin and specify VCC under Pin Name, and select Apply All. Specify VCCas Global net and Add to List. Also connect VCC (Under Net Basename) to VCC.Repeat the same procedure for GND net. Also connect Tie High and Tie low toVCC and GND respectively. Figure A.5 shows its setting.• Power Planning: We first add power rings to cover core area with supply andground so that connections can be made easily. For this goto

47 A.2. SoC Encounter FlowFigure A.5: Global Net SpecificationPower ⇒ Power Planning ⇒ Add RingsAnd do the settings as shown in Figure A.6 and click on OK. This will create tworings covering the core. One is of GND (inner ring) and other one is of VCC (outerring).Figure A.6: Add Core Ring PaneNow we create a mesh or stripes (as per requirement) to minimize the voltage sag.This can be done by adding stipes (horizontal or vertical stripes) or mesh (horizontaland vertical stripes). For this gotoPower ⇒ Power Planning ⇒ Add Stripes

A.2. SoC Encounter Flow 48Do the settings according to the design (as shown in Figure A.7 for our example)and click OK.Figure A.7: Add Stripe Ring PaneNow we can supply power to each cell of the design. This can be done usingSROUTE, Figure A.8. This creates small horizontal power and ground lines asper the size of standard cells defined in library file.Figure A.8: SROUTE Pane

49 A.2. SoC Encounter FlowAfter this step you can see some blue horizontal lines in the core. Also If the designis with IO pads then one can see the power ring to IO connection as shown in FigureA.9.Figure A.9: A view of Core after SROUTE• Standard Cell Placement: The layout is now ready for standard cell placementand this can be influenced by the information in the constraint file. The standardsetting in the Placement Mode Panel, As selected in Figure A.10, should be sufficientto achieve satisfying placement result. To start placement with the provided settingschoosePlace ⇒ Standard CellsHit OK with the default setting.• Clock Tree Synthesis: SoC Encounter generates a clock tree by mapping therequirements in the clock specification file (.cts) and constraint file (.scf) to thephysical facts. The clock tree is assembled by appropriate sized clock buffers thatwill be accommodated in the core row gaps. To synthesize clock tree chooseClock ⇒ Design ClockSelect the .cts file or generate by selecting the required cells. Then click OK.• Signal Routing: After all blocks and cells are placed and the clock tree is routedthe cells on the core rows need to be connected as specified in the netlist. This is

A.2. SoC Encounter Flow 50Figure A.10: Placement Mode Setting Paneaccomplished by a routing tool named Nanoroute which is incorporated into SoCEncounter. SelectRoute ⇒ Nanoroute ⇒ Routeto open the nanoroute pane as shown in Figure A.11.Figure A.11: Signal Routing PaneAfter this step gotoVerify ⇒ Verify Geometryand verify for any violation in Tools ⇒ Violation Browser. If there is anyviolation then change your floorplan and do the whole procedure again.• Add Filler cell: The last step of placement and routing is to fill the rest area withfiller cells. To do this choosePlace ⇒ Physical Cell ⇒ Add FillerAdd all the filler cells from the list and click on OK. This makes the core utilization

51 A.2. SoC Encounter Flowequals to 1 because now filler cells will also be counted as the cells that fills corearea.At last the Core should look like Figure A.12.Figure A.12: Final View of Core after Placement and Routing

A.2. SoC Encounter Flow 52

Appendix BLEF File Generation Using AbstractGeneratorThe abstract generator program abstract is used for generating LEF file of clocked inverterFlip-flop without technology description part. This is due to the difference in layerdefinition in UMC library and faraday standard cell library. So the technology descriptionpart must be created manually.A LEF file describing a library has two parts:1. The technology description part:• The layers available in the technology. Only layers involved in PNR (Placeand route) should be included.• Part of design rules which affect PNR operation.• Library designer-defined routing rules such as preferred direction of metaltrack, chosen value of routing pitch etc.2. The cell description part, describing the geometries comprising each cell:• The shape and size of cell.• The location of pins and the layer those pins sits on and its geometric description.• Detailed description of layers which do not belong to any particular pins butprohibit the passage of routing tracks in the same layer.53

54The first part can be generated by either manually or by using some tools, like technologyfile editor (as in our case). It is required by abstract as an input for generation of thesecond part.It is assumed that we have UMC technology file using which we have made the circuitand layout.1. Create an new library in cadence, named as, For example, std_tech.2. In CIW (Command Interpreter Window) choose Tools -> Technology File ManagerSelect Load and browse for the UMC technology file to load into the librarystd_tech.3. Now again open Technology File Manager and Double Click on PR. Do thesetting as shown in figure B.1 in Routing Layers subclass.Figure B.1: Defining routing layers4. In Via types subclass do the setting as shown in figure B.2.Figure B.2: Defining VIA

55Now save the technology library. Assign this library to your design library.To generate abstract view, we need to go through three steps:1. Pins: Defines Pins.2. Extract: Extracts the pins and obstruction information.3. Abstract: Generate the abstract view of layout.Now open the layout and goto Tools ⇒ Abstract Generator.• Step 1 (Defining Pins):1. Select Flow ⇒ Pins. In new window, goto Map tab and specify the layersin which labels for pins are drawn as shown in figure B.3.Figure B.3: Defining Pin Labels2. In Boundary tab, check whether any boundary adjustment is needed. SetCreate Boundary as as needed.

563. Press RUN button and wait until pin generation step is completed. There maybe some warnings related to the parts of the cells being outside the boundary.They could be just ignored.• Step 2 (Pin and Obstruction Extraction):Choose Flow ⇒ Extract. In Power tab, check Extract power nets. ChooseRUN.• Step 3 (Abstract Generation):1. Choose Flow ⇒ Abstract. In Blockage tab of new window, set the Blockagetype for all layer as Detailed.2. In Overlap tab, set Create Overlap Boundary as as needed.3. press Run.Now to Export LEF file of the cell, In Abstract Generator window, select File ⇒ ExportLEF and Press OK.

Bibliography[1] C. G. B. Garrett and W. H. Brattain, "Physical Theory of Semiconductor Surface,"Physical Review, vol. 99, p. 376, 1955.[2] Y. Tsividis, Operation and modelling of MOS transistors, 2nd ed. New York:McGraw-Hill, 1999.[3] Neil H. E. Weste and D. M. Harris, CMOS VLSI Design - A Circuit and SystemPerspective, 4th ed. Pearson, 2011.[4] CALHOUN et al., "Modeling and Sizing for Minimum Energy Operation in SubthresholdCircuits", IEEE Journal of Solid-State Circuits, vol. 40, No. 9, Sep. 2005.[5] Ray Andraka, "A survey of CORDIC algorithms for FPGA based computers", Proceedingsof the 1998 ACM/SIGDA sixth international symposium on Field programmablegate arrays, Feb. 22-24, 1998.[6] E. vittoz, "Weak Inversion For Ultimate Low Power Logic," in Low-Power ElectronicsDesign, C. Piguet, Ed. CRC Press, 2005.[7] Massimo Alioto, "Ultra-Low power VLSI Circuit Design Demystified and Explained:A Tutorial", IEEE Transactions on Circuits and systems, July, 2010.[8] Shailendra Jain, Surhud Khare, "A 280mV-to-1.2V Wide operating Range IA-32 Processor in 32nm CMOS," IEEE international Solid-state circuit conference,ISSCC, 2012.[9] S. Roundy, P. Wright, and J. Rabaey, "A Study of Low level Vibrations as a powerSource for Wireless Sensor nNdes," Computer Communications, vol. 26, no.11, pp.1131-1144, 2003.[10] Synopsys Inc., Library Compiler User Guide57

BIBLIOGRAPHY 58[11] FARADAY cell library, FSCOH_D 0.13µm Standard Cell, Databook, ver. 1.1, 2004.[12] Cadence Inc., Abstract Generator User Guide[13] Virginia University. SoC Encounter Tutorial.http://www.ee.virginia.edu/ mrs8n/soc/enc_tutorial.html[14] Portland State University. Creating LEF File tutorial, Jan. 2004.[15] Cadence Inc., OCEAN Reference, ver. 5.1.41, June 2004.[16] Cadence Inc., http://www.cadence.com.[17] Wikipedia, Flip-flop (electronics),http://en.wikipedia.org/wiki/Flip-flop_(electronics)

Master of Engineering Balram Sahu - Embedded Sensing ...

Create successful ePaper yourself

Delete template?

Save as template?