09.07.2015 Views

Master of Engineering Balram Sahu - Embedded Sensing ...

Master of Engineering Balram Sahu - Embedded Sensing ...

Master of Engineering Balram Sahu - Embedded Sensing ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

"Minimum Energy Point Operation <strong>of</strong> ASICCircuits"A PROJECT REPORT SUBMITTEDIN PARTIAL FULFILMENT OFTHE REQUIREMENTS FOR THE DEGREE OF<strong>Master</strong> <strong>of</strong> <strong>Engineering</strong>In TheFaculty Of <strong>Engineering</strong>By<strong>Balram</strong> <strong>Sahu</strong>Guided ByPr<strong>of</strong>. Bharadwaj AmruturCentre For Electronics Design And TechnologyIndian Institute Of Science, BangaloreJune 2012Copyright © 2012 IIScAll Rights Reserved


ContentsTable <strong>of</strong> ContentsList <strong>of</strong> FiguresAbstractiiivvii1 Introduction 11.1 Energy Constrained Applications . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Low-power Digital Signal Processor and Micro-controller Units . . 21.1.2 Wireless Micro-sensor Networks . . . . . . . . . . . . . . . . . . . 21.1.3 Biomedical Applications . . . . . . . . . . . . . . . . . . . . . . . 21.1.4 Radio Frequency Identification (RFID) . . . . . . . . . . . . . . . 21.2 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.1 Battery Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.2 Energy Harvesting . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Background 52.1 Modeling for Sub-threshold operation . . . . . . . . . . . . . . . . . . . . 62.2 Challenges in Sub-threshold operation . . . . . . . . . . . . . . . . . . . 73 Modifications in Standard Cell Library 113.1 Transmission gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Standard transmission gate CMOS Flip-flop . . . . . . . . . . . . . . . . 123.3 Clocked Inverter CMOS Flip-flop . . . . . . . . . . . . . . . . . . . . . . 143.4 Modification in Standard Cell Library . . . . . . . . . . . . . . . . . . . . 14iii


CONTENTSiv4 Flip-Flop Characterization 174.1 Requirements on the library . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Layout Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.1 Layout Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.2 Clocked Inverter Flip-flop Layout . . . . . . . . . . . . . . . . . . 204.3 Generation <strong>of</strong> LEF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3.1 Abstract view <strong>of</strong> Clocked Inverter Flip-flop . . . . . . . . . . . . . 214.4 Behavioral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4.1 Timing model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4.2 Power model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.4.2.1 Leakage Power . . . . . . . . . . . . . . . . . . . . . . . 284.4.2.2 Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . 285 Simulations and Results 335.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.2 Process Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.4 Comparison <strong>of</strong> results with Normal library . . . . . . . . . . . . . . . . . 385.5 The openMSP430 Micro-controller . . . . . . . . . . . . . . . . . . . . . . 406 Conclusions And Scope for Future Work 416.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.2 Scope for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42A Place And Route using SoC Encounter 43A.1 Input Requirements <strong>of</strong> SoC Encounter . . . . . . . . . . . . . . . . . . . 44A.2 SoC Encounter Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44B LEF File Generation Using Abstract Generator 53


List <strong>of</strong> Figures2.1 Early measurement <strong>of</strong> the I D (V GS ) characteristics <strong>of</strong> a P-channel metalgateMOS transistor (Cleaned-up plot fom [6]) . . . . . . . . . . . . . . . 52.2 NMOS transistor current contribution in sub-threshold. (a) Sub-thresholdcurrent. (b) Gate current. (c) Junction leakage current. . . . . . . . . . . 62.3 Normalized FO4 delay vs. V DD [7] . . . . . . . . . . . . . . . . . . . . . . 82.4 I ON to I OF F ratio <strong>of</strong> an Inverter . . . . . . . . . . . . . . . . . . . . . . . 92.5 Stacking Factor for I ON and I OF F for 2 and 3 stacked NMOS transistors 92.6 Delay slowdown <strong>of</strong> Stacked devices . . . . . . . . . . . . . . . . . . . . . 103.1 Pass Transistor Strong and Degraded outputs . . . . . . . . . . . . . . . 113.2 Transmission Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Leakage in transmission Gate . . . . . . . . . . . . . . . . . . . . . . . . 123.4 Standard transmission gate CMOS flip-flop . . . . . . . . . . . . . . . . . 133.5 Data Write-Back in Standard Transmission Gate Flip-flop. . . . . . . . . 133.6 Schematic Design <strong>of</strong> Clocked Inverter Flip-flop . . . . . . . . . . . . . . . 144.1 General Shape <strong>of</strong> Standard cell core cell . . . . . . . . . . . . . . . . . . 194.2 Definition <strong>of</strong> routing Pitch . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Layout <strong>of</strong> Clocked Inverter CMOS D Flip-Flop . . . . . . . . . . . . . . . 204.4 Layout <strong>of</strong> Clocked Inverter CMOS D Flip-Flop . . . . . . . . . . . . . . . 214.5 Timing definitions in standard cell library . . . . . . . . . . . . . . . . . 254.6 Timing Sense <strong>of</strong> arcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.7 Calculation <strong>of</strong> Setup Time . . . . . . . . . . . . . . . . . . . . . . . . . . 285.1 Process Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 Energy characteristics against supply voltage in SS Corner, (For ModifiedLibrary) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36v


LIST OF FIGURESvi5.3 Normalized Delay vs Supply Voltage (for modified library). . . . . . . . . 375.4 Energy characteristics against supply voltage in different process corners(for modified library). (a) Leakage Energy. (b) Total energy. . . . . . . . 375.5 Energy vs Supply Voltage . . . . . . . . . . . . . . . . . . . . . . . . . . 385.6 Supply voltage requirement vs Performance . . . . . . . . . . . . . . . . 395.7 Power Consumption vs Performance . . . . . . . . . . . . . . . . . . . . . 39A.1 SoC Encounter Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43A.2 SoC Encounter’s GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45A.3 Importing Design in SoC Encounter . . . . . . . . . . . . . . . . . . . . . 45A.4 Specifying Floorplan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46A.5 Global Net Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 47A.6 Add Core Ring Pane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47A.7 Add Stripe Ring Pane . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48A.8 SROUTE Pane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48A.9 A view <strong>of</strong> Core after SROUTE . . . . . . . . . . . . . . . . . . . . . . . . 49A.10 Placement Mode Setting Pane . . . . . . . . . . . . . . . . . . . . . . . . 50A.11 Signal Routing Pane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50A.12 Final View <strong>of</strong> Core after Placement and Routing . . . . . . . . . . . . . . 51B.1 Defining routing layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54B.2 Defining VIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54B.3 Defining Pin Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55


AbstractAlthough energy dissipation has improved with each technology node, the energy expendedper operation has become a critical consideration in digital circuits. In this thesis,the focus is on the implementation <strong>of</strong> ASIC designs that can operate in sub-thresholdvoltage so that the minimum energy operating point can be achieved.Some variation aware modifications are made in standard cell library based on the requirement<strong>of</strong> sub-threshold operation ,like delay slow-down, leakage current and unwanted cellsare removed from the library. Flip-flop is a critical and mostly used element <strong>of</strong> digitaldesigns. Standard design <strong>of</strong> flip-flop using transmission gates is changed into clockedinverterflip-flop so that it can operate in sub-threshold region. Using this modifiedlibrary, CORDIC algorithm in rotation mode is synthesized and compared over a widerange <strong>of</strong> supply voltage, from 270mV to 1.2V, with the design synthesized through normallibrary.vii


viii


Chapter 1IntroductionA recent explosion in the applications that benefit from low energy operation has cravedout a significant niche for sub-threshold circuits. Digital circuits operating in subthresholduse a supply voltage that is less than the threshold voltages <strong>of</strong> transistors.In this region <strong>of</strong> operation, circuits consume less energy for active operation and dissipateless leakage power than higher voltage alternatives, but they operate more slowly.Until the emphasis on maximizing operational frequency in digital circuits dominated tothe point that sub-threshold operation received very little attention. But as the demand<strong>of</strong> energy constrained application is increasing, sub-threshold circuits as gaining moreattention.1.1 Energy Constrained ApplicationsEnergy Consumptions is a key metric for large number <strong>of</strong> emerging set <strong>of</strong> applications.These Energy constrained applications generally have low activity rates and low speedrequirements, but the system is required to have long battery lifetime, typically morethan 5 years. Ideally the power consumption <strong>of</strong> these systems will decrease to the pointthat they can harvest energy from their environments and have theoretically unlimitedlifetime.1


1.4. Thesis Organization 4harvesting which is possible only when the average power consumption is sufficiently low.Sub-threshold circuit design provides the solution to keep the power consumption lowenough so that the circuit can operate at the voltage level that minimizes the energyconsumption.CORDIC algorithm is a basic block <strong>of</strong> any DSP unit. It performs the trigonometricfunctions without any multiplication, by just doing shift and add. CORDIC algorithmis also used in communication systems to generate quadrature components <strong>of</strong> signal.Since this work is focusing on the low power DSP and MCU wireless communications forbiomedical applications, so a design <strong>of</strong> CORDIC algorithm is implemented in rotationmode and demonstrated its minimum energy point at 0.13µm UMC technology.1.4 Thesis OrganizationThis thesis is organized as follows:• Chapter 2 gives a brief description <strong>of</strong> sub-threshold circuits and challenges in designingsub-threshold circuits.• Chapter 3 describes about the problems in standard cells when operating in subthresholdregime. In this chapter we have discussed about the design <strong>of</strong> clockedinverterCMOS flip-flop and the modifications made in standard cell library to makeit reliable for sub-threshold circuits.• Chapter 4 talks about the standard cell library and the characterization process <strong>of</strong>standard cells.• Chapter 5 includes the process flow, i.e. basic steps taken in work and assumptionsmade. This chapter also includes the simulation results and comparison <strong>of</strong> CORDICdesigns (Using pruned and normal libraries).


Chapter 2BackgroundThe weak inversion state in a MIS (Metal-Insulator-Silicon) structure at the surface wasalready implicitly mentioned as the "parabolic region" by Garett and Brattain in theirearly paper on the MIS diode [1]. The Characterization <strong>of</strong> this particular situation wasdone by the fact that majority carriers have been repelled away from the surface, depletioncharge <strong>of</strong> fixed atoms was left behind. Minority carrier density is increased with respectto the distant bulk, but it is still negligible in the overall charge balance, and, therefore,does not affect the CV (capacitance-voltage) curve <strong>of</strong> the MIS structure. However, theseminority carriers are the only mobile charge available at the surface. Hence application<strong>of</strong> some voltage between the source and the drain <strong>of</strong> a MOS transistor structure, causesminority carries to move, and current flow from drain to source.Figure 2.1: Early measurement <strong>of</strong> the I D (V GS ) characteristics <strong>of</strong> a P-channel metal-gateMOS transistor (Cleaned-up plot fom [6])5


2.1. Modeling for Sub-threshold operation 6Since this current was very small (in sub-microampere level), it was ignored for years,even for rather wide transistors. This sub-threshold current was measured at very lowcurrent level, and showed the unusual exponential dependency <strong>of</strong> the drain current onthe gate voltage depicted in figure 2.1. Weak inversion then came into attention <strong>of</strong> thedigital design community under the name "sub-threshold current".2.1 Modeling for Sub-threshold operationSome part <strong>of</strong> this section is takes from [4].Considering an NMOS transistor operating in sub-threshold (i.e. V GS < V T H , where V T His the transistor threshold voltage) experiences the three current contributions as shownin Figure 2.2.(a)-(c): the sub-threshold current I ST (due to diffusion <strong>of</strong> minority carriersbetween drain and source [2]), the gate current I G (due to tunneling through dielectric)and the junction leakage I G (due to BTBT current across depletion regions) [3].Figure 2.2: NMOS transistor current contribution in sub-threshold. (a) Sub-thresholdcurrent. (b) Gate current. (c) Junction leakage current.Due to the much stronger dependence on the gate voltage, I G tends to be much lowerthan I ST at low voltages, and the same holds for I J . Hence the NMOS current at ULVis dominated by the sub-threshold contribution I STwritten in the following form [2].in figure 2.2.(a), which is usuallyI ≈ I ST = I 0WL e(V GS−V T H )/n.v t(1 − e (−V DS/v t ) ) (2.1)Considering DIBL effect it can be written as follows [7]:I = β.e (V GS)/n.v t[e λ DSV DS /nv t(1 − e (−V DS/v t) )] (2.2)Wβ = I V T H00L e− n.v t


7 2.2. Challenges in Sub-threshold operationHere I 0 is the technology dependent sub-threshold current extrapolated for V GS = V T H , v t =kTqis thermal voltage, W/L is the aspect ratio and n is the sub-threshol factor (1 +C d /C OX ) [2].The model we develop uses fitting parameters that are normalized to the characteristicinverter in the technology <strong>of</strong> interest. Equation 2.3 shows the propagation delay <strong>of</strong> acharacteristic inverter with output capacitance C g in sub-threshold.t d =K.C g .V DDI 0,g .e (V GS−V T,g )/nV th(2.3)Where K is delay fitting parameter. The expression for current in the denominator <strong>of</strong> 2.3models the ON current <strong>of</strong> the characteristic inverter, so it accounts for transition throughboth NMOS and PMOS devices.unless the PMOS and NMOS devices are perfectlysymmetrical, the terms I 0,g and V T,g are fitted parameters that do not correspond exactlywith the parameters <strong>of</strong> the same name [4]. Operational Frequency can be simply statedas:1f =(2.4)t d .L DPwhere L DP is the depth <strong>of</strong> the critical path in characteristic inverter delays. DynamicEnergy (E DY N ), Leakage Energy (E LEAK ) and total energy (E T ) per cycle are expressedas 2.5-2.8 [4], assuming rail-to-rail swing.E DY N = C eff .V 2 DD (2.5)−V T,gE LEAK =n.VW eff .I 0,g .e th .t d L DP .V DD (2.6)= W eff .K.C g .L DP .V 2 DD.e −V DDn.V th (2.7)E T = E DY N + E LEAK = V 2 DD(C eff + W eff .K.C g .L DP .e −V DDn.V th ) (2.8)Equations 2.5-2.8 extend the expression for current and delay <strong>of</strong> an inverter to an arbitrarylarger sized circuits. This extension sacrifices accuracy for simplicity. Thus C eff is theaverage total switched capacitance <strong>of</strong> the entire circuit, including the average activity2.2 Challenges in Sub-threshold operationAlthough the sub-threshold circuit design opens doors <strong>of</strong> many opportunities but it hasto face challenges also. These challenges have to be taken care to design a good ultra-lowpower circuit so that it fulfills the user requirements.


2.2. Challenges in Sub-threshold operation 8From equation 2.2, The MOS sub-threshold ON current can be given as:I ON ≈ β.e V DDn.v t (2.9)where V DD ≫ v t is assumed. This states that in sub-threshold regime, the reduction inV DD determine an exponential degradation in the delay τ D as shown in its classical CV/Iexpression in equation 2.10.τ D =C . V DDI ON 2= C 2β . V DD(2.10)e V DDn.v tFigure 2.3 depicts the same for an FO4 Inverter delay. The FO4 trend in sub-thresholdis approximately exponential, as expected from equation 2.10.Figure 2.3: Normalized FO4 delay vs. V DD [7]Another problem in sub-threshold operation is the leakage current. From equation 2.2,We can write the <strong>of</strong>f current <strong>of</strong> a MOS device as follows.I OF F = βe λ DS V DDn.vt (1 − e −V DDv t ) ≈ β (2.11)Hence From equation 2.9 and 2.11, we can write:I ONI OF F= e V DD/n.v t(2.12)Which is exponentially depending on the supply and reduces as we reduce supply voltage.It means in sub-threshold OFF current <strong>of</strong> transistor becomes significantly comparable tothe ON current. Hence it has a stronger impact on power compared to super-thresholdcircuits.


9 2.2. Challenges in Sub-threshold operationFigure 2.4: I ON to I OF F ratio <strong>of</strong> an InverterIn Figure 2.4, I ON to I OF F ratio <strong>of</strong> an FO4 inverter is plotted against the supply voltage.It rolls <strong>of</strong>f exponentially in sub-threshold regime as expected from equation 2.12.Another factor affecting the sub-threshold operation is stacked devices. Stacking helpsin super-threshold operation in reducing power consumption. When transistors are inseries, there overall strength is lower than that <strong>of</strong> a single transistor by a well knownstacking factor. The stacking factor for 0.13µm technology for 2 and 3 stacked devicesare plotted in the figure 2.5.Figure 2.5: Stacking Factor for I ON and I OF F for 2 and 3 stacked NMOS transistors


2.2. Challenges in Sub-threshold operation 10Due to stacking factor, current reduces flowing through the stacked transistors. In subthresholdoperation, current reduction increases delay exponentially by equation 2.10.This worsens the performance and slows down the circuit. Here is a plot for stackeddevices against their delay in Figure 2.6.Figure 2.6: Delay slowdown <strong>of</strong> Stacked devicesFrom the Figure 2.6, We see that, at 270mV, 4 stacked NMOS increases the delay by 114%as compared to 2 stacked NMOS in 0.13µm technology. It refers to dramatic degradationin performance. Similarly 4 stacked devices increases delay by 50% as compared to 3stacked devices in 0.13µm technology. In lower technology nodes, These results goesworse because delay is exponentially dependent on technology (through slope factor n),as depicted in equation 2.10.


Chapter 3Modifications in Standard Cell LibraryThis chapter includes the design <strong>of</strong> Clocked inverter CMOS Flip-flop and the Modificationsdone in standard cell library.3.1 Transmission gatesStrength <strong>of</strong> a signal is measured in terms <strong>of</strong> how closely it is to the ideal voltage source.In a design, supply voltage (VDD and ground corresponds to 1 and 0 respectively) istaken as the reference and signal strength is defined with respect to it. More close tothe supply (VDD or Ground corresponding to 1 or 0 respectively) is better strength <strong>of</strong>signal.Figure 3.1: Pass Transistor Strong and Degraded outputsAs shown in figure 3.1, an NMOS transistor is an almost perfect in passing a 0 whiledegrade the output while passing 1 by its threshold voltage. Similarly a PMOS transistor11


3.2. Standard transmission gate CMOS Flip-flop 12degrade the output while passing 0 by its threshold voltage and transmits 1 as it is.Thus we construct the transmission gate by adding an NOMOS device with a PMOStransistor in parallel and controlling their gates by opposite clock levels as shown infigure 3.2.Figure 3.2: Transmission GateAs we have discussed in section 2.2 that I ON to I OF F ratio <strong>of</strong> transistor degrades in subthresholdregime and OFF current is significantly comparable to ON current, transmissiongates may fail to block the input and can pass a wrong value as shown in Figure 3.3.Figure 3.3: Leakage in transmission Gate3.2 Standard transmission gate CMOS Flip-flopA standard transmission gate CMOS flip-flop two level-sensitive latches, made by usingtransmission gates as shown in figure 3.4.When the clock is low, the first (master) latch output follows the D input while the second(slave) latch holds the previous value. Thus when the clock rises from 0 to 1, the masterlatch become opaque and holds the D value at the time <strong>of</strong> the clock transition.


13 3.2. Standard transmission gate CMOS Flip-flopFigure 3.4: Standard transmission gate CMOS flip-flopSince Standard transmission gate flip-flop uses transmission gate and transmission gatesare not good to operate in sub-threshold regime, we can not use these flip-flops forthe same operating voltage range. There is a risk <strong>of</strong> data write-back in these flip-flopsoperating in sub-threshold voltage as shown in figure 3.5.Figure 3.5: Data Write-Back in Standard Transmission Gate Flip-flop.In Figure 3.5, flip-flop is being operated in state S1, i.e. node A is at voltage level 1 andthis state is being hold by feedback loop in slave latch. Similarly node B is at voltagelevel 0 in state S1. But node B can be corrupted by the leakage <strong>of</strong> transmission gate andmay cross the V IH (Minimum Input High voltage) <strong>of</strong> feedback inverter in master latch.This will invert the master latch state and in the next positive clock edge wrong datamay be transmitted. This is called as "Data Write-Back".


3.3. Clocked Inverter CMOS Flip-flop 143.3 Clocked Inverter CMOS Flip-flopA clocked-CMOS-style flip-flop implementation replaces master and slave transmissiongates in conventional circuit topology by clocked inverter, thereby eliminating the risk <strong>of</strong>data write-back [8]. The designed Clocked inverter CMOS flip-flop is shown in figure 3.6.Figure 3.6: Schematic Design <strong>of</strong> Clocked Inverter Flip-flopAs shown in the Figure 3.6, there is no inverter used in the signal path to reduce the delay.Interruptible keepers are used to avoid write contention [8]. These keepers are up-sizedto improve state retention. Clocked Inverter CI2 has to be large in size to reduce theclock-to-Q delay.3.4 Modification in Standard Cell LibraryStandard cell library contains all the cells with different strengths. As we have discussedin section 2.2 that stacking worsens the performance in sub-threshold operation, we canconclude that cells with higher stacked devices should not be used. So we did somepruning in the library and remove the unwanted cells. The criterion <strong>of</strong> deciding unwantedcells was based on the number <strong>of</strong> stacked devices and transmission gates. We removed the4 and more stacked devices from the library. Also we removed the MUX devices becausethey use transmission gates.Instead <strong>of</strong> standard transmission gate flip-flop, we characterized the clocked inverter flipflopand used the same in place <strong>of</strong> standard flip-flop. Thus we left with only 19 gates in


15 3.4. Modification in Standard Cell Librarythe cell library. Here is the list <strong>of</strong> these cells.1. AN2 :2 input AND gate2. An3 : 3 input AND gate3. BUF : Buffer4. BUFCk : Clock Tree buffer5. INV : Inverter6. INVCK : Clock Tree Inverter7. DFF_new : Clocked-inverter flip-flop8. DFFRF : Flip-flop with Read enable signal9. DLAH : D latch10. OR2 : 2 input OR gate11. OR3 : 3 input OR gate12. ND2 : 2 input NAND gate13. ND3 : 3 input NAND gate14. NR2 : 2 input NOR gate15. NR3 : 3 input NOR gate16. Tie0 : Tie to 017. Tie1 : Tie to 118. XOR2 : 2 input XOR gate19. XNR2 : 2 input XNOR gate


3.4. Modification in Standard Cell Library 16


Chapter 4Flip-Flop CharacterizationIn theory any logic system can be built using universal cells (NAND or NOR) or usingAND, OR and NOT gates but as the complexity <strong>of</strong> circuit design grows, it becomesimpractical to design circuits by hand. Therefore, the use <strong>of</strong> automatic synthesis toolshas become mandatory. The use <strong>of</strong> synthesis and PNR tools requires the provision <strong>of</strong>cell library. So the first step <strong>of</strong> the design is to develop such a library, or to acquire one.There are two most common properties that a layout library must posses.The first requirement is the functional completeness. Synopsys’ Design Analyzersynthesis tool require the library to contain, at minimum, six different types <strong>of</strong> cells,namely:• One type <strong>of</strong> tristate cell.• Either NOR and NAND gates or AND and OR gates.• Inverter• D flip-flop with asynchronous set and reset.• D latch with asynchronous set and reset.The second requirement pertains to the shape and sizes <strong>of</strong> cells. The shape and sizes<strong>of</strong> standard cells must be very regular. This also applies to the geometries inside the cellsparticularly those on metal layers. These requirements are intended to ensure that PNRtools would be able to lay down the routing tracks without being obstructed by metalgeometries.17


4.1. Requirements on the library 184.1 Requirements on the libraryThe development process <strong>of</strong> a standard cell library for use with Silicon Ensemble (SE)routing tools follow the following steps:1. Layout <strong>of</strong> cells.2. Creation <strong>of</strong> Synopsys synthesis and simulation libraries.3. Generation <strong>of</strong> LEF (Library Exchange Format) description <strong>of</strong> cells.The LEF file is an simplified ASCII file which contains only metal layers and other layerswhich can obstruct routing. Since shape <strong>of</strong> n-well/p-well or diffusion do not electricallyinfluence the metal track in a significant degree so these layers are avoided in LEF files.This file is used by SE tools during placement and routing process.4.2 Layout Technique4.2.1 Layout RulesFor a cell layout to be properly usable in Standard cell based design, several requirementshave to be satisfied. The standard cell design should be done so that the followingproperties should be satisfied by the standard cell.• The general shape <strong>of</strong> cell is as follows:Here, term "pins" refers to any shape in the particular layer being used for routing.• The sizes, shapes and location <strong>of</strong> all geometries in layers pertinent to routing areregularize. For example, If a metal1 signal track inside the cell is 1µm wide, allother metal1 tracks inside the cell must also be <strong>of</strong> the same width.• All power/ground pins should have the same width and should run in the samedirection i.e. all horizontal or all vertical. And these should be in the form <strong>of</strong> railat top and bottom end <strong>of</strong> cell as shown in figure 4.1.• The routing pitch should be at least line-to-via pitch, as shown in figure 4.2, wherethe closest separation satisfies design rule for metal to metal separation. Ideally


19 4.2. Layout TechniqueFigure 4.1: General Shape <strong>of</strong> Standard cell core cellFigure 4.2: Definition <strong>of</strong> routing Pitchit should be at least via-to-via pitch. This will allow the routing tool to put viawherever necessary.• All the routing layers should be defined during LEF generation so that routing toolcan decide which metal layer is to be actually used.• The number <strong>of</strong> metal layers used for internal connections within the cell should belimited. Always try to use metal1 only, so that all higher metal layers tracks arefreely available for use by routing tool.The rules discussed above are necessitated by the way SE tool perform routing. The toolperforms routing by laying down horizontal, vertical and Manhattan-style tracks. For


4.3. Generation <strong>of</strong> LEF 20each metal layer, the direction could be horizontal or vertical, but one direction is alwaystaken as preferred and other is non-preferred. For example, if for metal1, horizontaldirection is considered as preferred, then the tool automatically tries to create horizontaltracks first before restoring to creation <strong>of</strong> vertical tracks, although vertical tracks willeventually be used if deemed necessary.4.2.2 Clocked Inverter Flip-flop LayoutLayout <strong>of</strong> designed clocked-inverter flip-flop is made using 0.13µm UMC library andshown in figure 4.3. Since other cells in library are 21 pitch (21 times pitch <strong>of</strong> metal 1),the layout made is also <strong>of</strong> same height.Figure 4.3: Layout <strong>of</strong> Clocked Inverter CMOS D Flip-FlopDuring place and route, other cells will sit next to the designed flip-flop. So the layout isdone considering DRC (Design Rule Check) rules in place and route. For example, Nwellcovering the PMOS transistor should not violate DRC rule after place and route. Thefinal area <strong>of</strong> flip-flop is 9.95µm x 3.2µm as compared to the size <strong>of</strong> standard transmissiongate flip-flop with are <strong>of</strong> 8µm x 3.2µm.4.3 Generation <strong>of</strong> LEFLEF is an abstract <strong>of</strong> cell that contain only metal layers and other layers that can obstructrouting. The steps to extract LEF file out <strong>of</strong> a layout using Abstract Generator tool


21 4.3. Generation <strong>of</strong> LEFprovided by Cadence are given in Appendix B. Since the metal nomenclature providedin 0.13µm Faraday standard cell library is different as provided in 0.13µm UMClibrary, we made some changes in extracted LEF file to make it compatible with 0.13µmFaraday standard cell library.4.3.1 Abstract view <strong>of</strong> Clocked Inverter Flip-flopThe extracted abstract view <strong>of</strong> the clocked inverter flip-flop is shown in figure 4.4.Figure 4.4: Layout <strong>of</strong> Clocked Inverter CMOS D Flip-FlopFinal LEF file <strong>of</strong> clocked inverter flip-flop is given below.1 NAMESCASESENSITIVE ON ;MACRO DFF_NEW3 CLASS CORE ;FOREIGN DFF_NEW −0.12 −0.28 ;5 ORIGIN 0 . 1 2 0 . 2 8 ;SIZE 1 0 . 1 8 BY 3 . 7 6 ;7 SYMMETRY X Y ;SITE c o r e ;9 PIN DDIRECTION INPUT ;11 USE ANALOG ;PORT13 LAYER ME1 ;RECT 1 . 7 3 1 . 3 5 2 . 0 0 1 . 6 4 ;15 RECT 1 . 7 3 1 . 3 0 1 . 9 3 1 . 8 2 ;END17 END DPIN ck19 DIRECTION INPUT ;USE CLOCK ;


4.3. Generation <strong>of</strong> LEF 2221 PORTLAYER ME1 ;23 RECT 0 . 1 0 1 . 4 6 0 . 3 8 1 . 7 4 ;RECT 0 . 1 0 1 . 3 2 0 . 3 0 1 . 8 4 ;25 ENDEND ck27 PIN QDIRECTION OUTPUT ;29 PORTLAYER ME1 ;31 RECT 9 . 5 6 1 . 8 4 9 . 8 5 2 . 6 5 ;RECT 9 . 6 4 0 . 6 7 9 . 8 5 2 . 6 5 ;33 RECT 9 . 5 6 0 . 6 7 9 . 8 5 1 . 0 2 ;END35 END QPIN GND!37 DIRECTION INPUT ;USE GROUND ;39 SHAPE ABUTMENT ;PORT41 LAYER ME1 ;RECT 0 . 0 0 −0.28 9 . 9 5 0 . 2 8 ;43 RECT 9 . 0 4 0 . 6 7 9 . 3 2 1 . 0 2 ;RECT 9 . 1 2 −0.28 9 . 2 8 1 . 0 2 ;45 RECT 7 . 9 1 0 . 5 2 8 . 1 9 0 . 8 0 ;RECT 7 . 9 5 −0.28 8 . 1 4 0 . 8 0 ;47 RECT 5 . 5 9 −0.28 5 . 8 8 0 . 4 0 ;RECT 4 . 5 1 −0.28 4 . 7 4 0 . 8 0 ;49 RECT 2 . 2 2 −0.28 2 . 5 0 0 . 4 0 ;RECT 0 . 6 2 0 . 6 5 0 . 9 0 0 . 8 1 ;51 RECT 0 . 6 9 −0.28 0 . 8 4 0 . 8 1 ;END53 END GND!PIN VCC!55 DIRECTION INPUT ;USE POWER ;57 SHAPE ABUTMENT ;PORT59 LAYER ME1 ;RECT 0 . 0 0 2 . 9 2 9 . 9 5 3 . 4 8 ;61 RECT 9 . 0 4 1 . 8 4 9 . 3 2 2 . 6 5 ;RECT 9 . 0 1 2 . 8 0 9 . 2 9 3 . 4 8 ;63 RECT 9 . 0 4 1 . 8 4 9 . 2 9 3 . 4 8 ;RECT 7 . 9 5 2 . 8 0 8 . 2 2 3 . 4 8 ;65 RECT 5 . 5 9 2 . 8 0 5 . 8 8 3 . 4 8 ;RECT 4 . 4 7 2 . 8 0 4 . 7 5 3 . 4 8 ;67 RECT 2 . 2 1 2 . 8 0 2 . 4 9 3 . 4 8 ;RECT 0 . 5 8 2 . 8 0 0 . 8 6 3 . 4 8 ;69 ENDEND VCC!71 OBSLAYER ME1 ;73 RECT 0 . 1 6 0 . 6 5 0 . 3 2 1 . 1 2 ;


23 4.3. Generation <strong>of</strong> LEFRECT 0 . 1 6 0 . 9 6 0 . 9 8 1 . 1 2 ;75 RECT 0 . 8 2 1 . 4 0 1 . 0 2 1 . 6 8 ;RECT 0 . 8 2 0 . 9 6 0 . 9 8 2 . 2 9 ;77 RECT 0 . 1 0 2 . 1 3 0 . 9 8 2 . 2 9 ;RECT 0 . 1 0 2 . 0 4 0 . 3 8 2 . 4 4 ;79 RECT 1 . 6 2 0 . 8 8 1 . 9 0 1 . 1 4 ;RECT 1 . 6 2 0 . 9 8 2 . 6 0 1 . 1 4 ;81 RECT 2 . 4 4 1 . 3 2 2 . 6 6 1 . 6 0 ;RECT 1 . 6 5 2 . 0 2 1 . 9 3 2 . 3 1 ;83 RECT 2 . 4 4 0 . 9 8 2 . 6 0 2 . 3 1 ;RECT 1 . 6 5 2 . 1 5 2 . 6 0 2 . 3 1 ;85 RECT 1 . 5 2 0 . 4 4 2 . 0 6 0 . 6 0 ;RECT 1 . 9 0 0 . 5 6 2 . 9 8 0 . 7 2 ;87 RECT 2 . 8 3 0 . 5 6 2 . 9 8 1 . 2 6 ;RECT 2 . 8 3 1 . 1 0 3 . 4 2 1 . 2 6 ;89 RECT 3 . 1 0 1 . 1 0 3 . 4 2 1 . 4 2 ;RECT 3 . 3 4 0 . 5 6 4 . 3 3 0 . 7 2 ;91 RECT 3 . 3 4 0 . 5 6 3 . 5 4 0 . 8 4 ;RECT 4 . 1 7 0 . 5 6 4 . 3 3 1 . 5 5 ;93 RECT 4 . 1 7 1 . 3 9 4 . 8 9 1 . 5 5 ;RECT 4 . 6 7 1 . 3 9 4 . 8 9 1 . 6 7 ;95 RECT 4 . 6 7 1 . 3 9 4 . 8 3 2 . 2 4 ;RECT 3 . 3 5 2 . 0 8 4 . 8 3 2 . 2 4 ;97 RECT 3 . 3 5 2 . 0 2 3 . 6 3 2 . 3 0 ;RECT 5 . 0 8 0 . 4 7 5 . 4 3 0 . 6 2 ;99 RECT 5 . 0 5 0 . 8 0 5 . 3 3 1 . 0 8 ;RECT 5 . 0 8 0 . 4 7 5 . 2 4 2 . 2 7 ;101 RECT 5 . 0 0 1 . 8 9 5 . 2 8 2 . 2 7 ;RECT 3 . 7 3 0 . 8 8 4 . 0 1 1 . 0 4 ;103 RECT 5 . 7 6 1 . 1 2 6 . 7 9 1 . 2 8 ;RECT 3 . 7 5 0 . 8 8 3 . 9 2 1 . 8 5 ;105 RECT 2 . 8 4 1 . 6 9 3 . 9 2 1 . 8 5 ;RECT 1 . 2 0 0 . 6 5 1 . 3 6 2 . 6 4 ;107 RECT 1 . 2 0 2 . 0 4 1 . 4 2 2 . 6 4 ;RECT 2 . 8 4 1 . 6 9 3 . 0 0 2 . 6 4 ;109 RECT 5 . 7 6 1 . 1 2 5 . 9 2 2 . 6 4 ;RECT 1 . 2 0 2 . 4 8 5 . 9 2 2 . 6 4 ;111 RECT 7 . 0 8 0 . 8 8 7 . 3 6 1 . 0 4 ;RECT 7 . 1 0 0 . 8 8 7 . 2 6 1 . 8 5 ;113 RECT 6 . 5 1 1 . 6 9 7 . 2 6 1 . 8 5 ;RECT 6 . 5 1 1 . 5 9 6 . 8 3 1 . 9 1 ;115 RECT 6 . 7 0 0 . 5 6 7 . 6 8 0 . 7 2 ;RECT 6 . 7 5 0 . 5 6 6 . 9 2 0 . 8 4 ;117 RECT 7 . 5 3 0 . 5 6 7 . 6 8 1 . 3 8 ;RECT 7 . 5 3 1 . 2 2 8 . 2 7 1 . 3 8 ;119 RECT 8 . 1 1 1 . 3 3 8 . 3 5 1 . 6 1 ;RECT 8 . 1 1 1 . 2 2 8 . 2 7 2 . 3 6 ;121 RECT 6 . 7 0 2 . 2 0 8 . 2 7 2 . 3 6 ;RECT 8 . 5 4 0 . 4 7 8 . 8 8 0 . 6 2 ;123 RECT 8 . 5 1 0 . 8 0 8 . 7 9 1 . 0 8 ;RECT 8 . 5 4 0 . 4 7 8 . 7 1 2 . 4 6 ;125 RECT 8 . 4 6 1 . 8 9 8 . 7 4 2 . 4 6 ;LAYER VI1 ;


4.4. Behavioral Model 24127 RECT 3 . 1 6 1 . 1 6 3 . 3 6 1 . 3 6 ;RECT 6 . 5 8 1 . 6 5 6 . 7 8 1 . 8 5 ;129 LAYER ME2 ;RECT 3 . 1 0 1 . 1 6 5 . 7 5 1 . 3 6 ;131 RECT 3 . 1 0 1 . 1 0 3 . 4 2 1 . 4 2 ;RECT 5 . 5 4 1 . 1 6 5 . 7 5 1 . 8 5 ;133 RECT 5 . 5 4 1 . 6 6 6 . 8 3 1 . 8 5 ;RECT 6 . 5 1 1 . 5 9 6 . 8 3 1 . 9 1 ;135 ENDEND DFF_NEW137END LIBRARY4.4 Behavioral Model4.4.1 Timing modelIn general, timing model can be expressed in simple mathematical model as follows:T otalcelldelay = Intrinsicdelay + T ransitionDelay + Slopedelay (4.1)The intrinsic delay <strong>of</strong> a cell is defined as the propagation delay <strong>of</strong> the cell withoutdriving load, while it is being driven by another identical loadless cell.The transition delay <strong>of</strong> a cell is that additional delay to intrinsic delay <strong>of</strong> a cell drivinga capacitive load and is driven by another identical loadless cell.The slope delay <strong>of</strong> a cell is defined as that extra delay (in addition to intrinsic andpossibly transition delay) when driven by the identical cell with transition delay.In practice, we worry about the total delay <strong>of</strong> a cell. This total delay is the delay exhibitby the cell driving a capacitive load and driven by an identical cell with transition delay.This total delay is called Propagation delay. The delay has to be defined with respectto some measurement points on the switching waveform. In standard cell library, suchpoints are defined using the following four variables:#Threshold point <strong>of</strong> an input falling edge:input_threshold_pct_fall : 50.0;#Threshold point <strong>of</strong> an input rising edge:input_threshold_pct_rise : 50.0;#Threshold point <strong>of</strong> an output falling edge:output_threshold_pct_fall : 50.0;


25 4.4. Behavioral Model#Threshold point <strong>of</strong> an output rising edge:output_threshold_pct_rise : 50.0;Typically 50% threshold is used for most standard cell libraries. The propagation delaycan be represented as:1. Output fall delay (T f ) : For example, the output <strong>of</strong> an inverter will fall if the inputis rising.2. Output rise delay (T r ) : For example, the output <strong>of</strong> an inverter will rise if the inputis falling.In practice, these two values are different and are defined separately in the library. Figure4.5 shows the definitions <strong>of</strong> these variables.Figure 4.5: Timing definitions in standard cell libraryIn timing models slew rate <strong>of</strong> waveform also plays a very important role. In practicallife, a cell is always driven by another cell which is having some slew rate (transitiondelay). In terms <strong>of</strong> exact calculation <strong>of</strong> propagation delay we should know the outputtransition delay <strong>of</strong> preceded cell. in standard cell library slew threshold setting is definedby following four variables, as shown in Figure 4.5:


4.4. Behavioral Model 26#Falling edge threshold:slew_lower_threshold_pct_fall : 10.0;slew_upper_threshold_pct_fall : 90.0;#Rising edge threshold:slew_lower_threshold_pct_rise : 10.0;slew_upper_threshold_pct_rise : 90.0;Each combinational cell has timing arcs from each <strong>of</strong> its input to the output. For sequentialcells like flip-flop the timing arcs are defined for clock pin to Q pin. Each timingarc has timing sense, that is, how the output changes for different types <strong>of</strong> transitions <strong>of</strong>input.Figure 4.6: Timing Sense <strong>of</strong> arcsPositive unate timing arc means the output transition is same as the input transition.Negative unate timing arc is one which causes opposite output transition to the inputtransition. In a Non-unate timing arc, output transition can not be determined solelyby the direction <strong>of</strong> an input but also depends on the state <strong>of</strong> other inputs. An example<strong>of</strong> timing arcs is given in Figure 4.6.Delay <strong>of</strong> a cell is defined in terms <strong>of</strong> the output load capacitance and the input transition.In non-linear delay model, which is mostly used by standard cell libraries, delay values aregiven in a table format for different values <strong>of</strong> input transition and output load capacitance.


27 4.4. Behavioral ModelThese values <strong>of</strong> variables are discrete. If the table lookup does not match with any <strong>of</strong>the variable than two-dimensional interpolation is utilized to provide the resulting timingvalue. For example, let the two index_1 values (total output capacitance) are denotedas x 1 and x 2 , the two index_2 values (Input transition) are denoted as y 1 and y 2 and thecorresponding delay values are denoted as T 11 , T 12 , T 21 and T 22 . Now if the delay valueis required at (x 0 , y 0 ), then the lookup value T 00 can be given by interpolation as:T 00 = x 20 ∗ y 20 ∗ T 11 + x 20 ∗ y 01 ∗ T 12 + x 01 ∗ y 20 ∗ T 21 + x 01 ∗ y 01 ∗ T 22 (4.2)Wherex 01 = (x 0 − x 1 )/(x 2 − x 1 )x 20 = (x 2 − x 0 )/(x 2 − x 1 )y 01 = (y 0 − y 1 )/(y 2 − y 1 )y 20 = (y 2 − y 0 )/(y 2 − y 1 )For Sequential circuits, same timing models are used with some additional constrainedmodels. These constrained models are used for setup and hold time definition. Setuptime, for a positive edge triggered flip-flop, is defined as the data arrival time before thepositive edge <strong>of</strong> clock that gives the clock-to-Q delay degradation <strong>of</strong> 10% with respect tothe clock-to-Q delay at a very large data arrival time before the positive edge <strong>of</strong> clock asshown in Figure 4.7. Similarly hold time, for a positive edge triggered flip-flop, is definedas the change <strong>of</strong> data time after the positive edge <strong>of</strong> clock that gives us the clock-to-Qdelay degradation <strong>of</strong> 10% with respect to the clock-to-Q delay when the data changesafter a long time <strong>of</strong> positive edge <strong>of</strong> clock.4.4.2 Power modelPower dissipation in CMOS circuits comes from two components:• Dynamic Power: This is caused by charging and discharging <strong>of</strong> load capacitanceand due to the short circuit current when NMOS and PMOS, both, are ON.• Leakage Power: The source <strong>of</strong> leakage power dissipation is sub-threshold leakagecurrent, gate leakage and junction leakage.


4.4. Behavioral Model 28Figure 4.7: Calculation <strong>of</strong> Setup Time4.4.2.1 Leakage PowerFor a fixed supply voltage, gate leakage is approximately fixed and is very less comparedto sub-threshold leakage. Junction leakage is also very less compared to sub-thresholdleakage. Sub-threshold leakage depends on the input combination <strong>of</strong> the cell. For example,For 2 input NAND gate, leakage will be least when both inputs are at logic level 0and will be most when both the inputs are at logic level 1. Thereby for exact leakagepower calculation, we should characterize the cell for all the input combinations.4.4.2.2 Dynamic PowerDynamic power arises because <strong>of</strong> switching <strong>of</strong> the load. This depends on the output loadcapacitance and input transition time. In standard cell library, dynamic power is definedin table lookup form for different values <strong>of</strong> input transition and output load capacitance.Same methodology is used to calculate power for in-between point <strong>of</strong> the variable.The characterized Clocked Inverter flip-flop has the resulting data in the form <strong>of</strong> followinglines.c e l l (DFF_new) {2 area : 3 1 . 8 4 ;c e l l _ f o o t p r i n t : "QDFF" ;4 f f ( IQ , IQN) {next_state : "D" ;6 clocked_on : "CK" ;}


29 4.4. Behavioral Model8 cell_leakage_power : 17295.68 ;leakage_power ( ) {10 when : " !D␣∗␣ !CK" ;value : 1 6 6 8 3 . 8 1 ;12 }leakage_power ( ) {14 when : " !D␣∗␣CK" ;value : 1 8 5 1 1 . 5 6 ;16 }leakage_power ( ) {18 when : "D␣∗␣ !CK" ;value : 1 7 3 6 4 . 3 3 ;20 }leakage_power ( ) {22 when : "D␣∗␣CK" ;value : 1 6 6 2 3 . 0 4 ;24 }pin (Q) {26 function : "IQ" ;d i r e c t i o n : output ;28 max_capacitance : 0 . 1 2 5 5 9 3 ;internal_power ( ) {30 r e l a t e d _ p i n : "CK" ;power (POWER_7x7) {32 index_1 ( " 0 . 0 3 6 2 9 8 , 0 . 1 0 3 5 3 0 , 0 . 1 8 6 1 2 2 , 0 . 3 5 2 3 9 9 , 0 . 6 8 5 8 5 6 , 1 . 2 9 7 6 4 6 , 2 . 6 8 8 4 3 5 " ) ;index_2 ( " 0 . 0 0 1 5 0 0 , 0 . 0 0 3 3 0 6 , 0 . 0 0 7 2 8 7 , 0 . 0 1 6 0 6 2 , 0 . 0 3 5 4 0 4 , 0 . 0 7 8 0 3 5 , 0 . 1 7 2 0 0 0 " ) ;34 v a l u e s ( " 0 . 0 1 4 8 8 , 0 . 0 1 5 2 3 5 , 0 . 0 1 5 5 0 6 , 0 . 0 1 5 6 3 0 , 0 . 0 1 5 6 9 8 , 0 . 0 1 5 7 2 3 , 0 . 0 1 5 6 9 1 " ,\" 0 . 0 1 4 8 7 0 , 0 . 0 1 5 2 0 4 , 0 . 0 1 5 4 8 5 , 0 . 0 1 5 6 1 0 , 0 . 0 1 5 6 7 8 , 0 . 0 1 5 7 0 5 , 0 . 0 1 5 6 7 4 " ,\36 " 0 . 0 1 4 9 1 1 , 0 . 0 1 5 2 4 0 , 0 . 0 1 5 5 1 9 , 0 . 0 1 5 6 5 2 , 0 . 0 1 5 7 1 7 , 0 . 0 1 5 7 4 4 , 0 . 0 1 5 7 1 3 " ,\" 0 . 0 1 5 0 8 5 , 0 . 0 1 5 4 1 8 , 0 . 0 1 5 7 0 9 , 0 . 0 1 5 8 3 3 , 0 . 0 1 5 8 9 9 , 0 . 0 1 5 9 2 5 , 0 . 0 1 5 8 9 3 " ,\38 " 0 . 0 1 6 0 8 4 , 0 . 0 1 6 0 8 8 , 0 . 0 1 6 3 0 1 , 0 . 0 1 6 3 9 2 , 0 . 0 1 6 4 5 5 , 0 . 0 1 6 4 7 7 , 0 . 0 1 6 4 4 6 " ,\" 0 . 0 1 7 6 5 3 , 0 . 0 1 7 6 4 8 , 0 . 0 1 7 6 6 7 , 0 . 0 1 7 7 0 3 , 0 . 0 1 7 7 4 1 , 0 . 0 1 7 7 5 8 , 0 . 0 1 7 7 2 5 " ,\40 " 0 . 0 2 1 0 0 1 , 0 . 0 2 0 9 9 5 , 0 . 0 2 1 0 1 2 , 0 . 0 2 1 0 5 6 , 0 . 0 2 1 0 7 2 , 0 . 0 2 1 0 7 7 , 0 . 0 2 1 0 3 9 " ) ;}42 }timing ( ) {44 r e l a t e d _ p i n : "CK" ;timing_type : r i s i n g _ e d g e ;46 timing_sense : non_unate ;c e l l _ r i s e (DELAY_7x7) {48 index_1 ( " 0 . 0 3 6 2 9 8 , 0 . 1 0 3 5 3 0 , 0 . 1 8 6 1 2 2 , 0 . 3 5 2 3 9 9 , 0 . 6 8 5 8 5 5 , 1 . 2 9 7 6 5 1 , 2 . 6 8 8 4 3 6 " ) ;index_2 ( " 0 . 0 0 1 5 0 0 , 0 . 0 0 3 3 0 6 , 0 . 0 0 7 2 8 7 , 0 . 0 1 6 0 6 2 , 0 . 0 3 5 4 0 4 , 0 . 0 7 8 0 3 5 , 0 . 1 7 2 0 0 0 " ) ;50 v a l u e s ( " 0 . 2 1 6 2 3 6 , 0 . 2 3 0 0 6 6 , 0 . 2 5 9 1 3 1 , 0 . 3 2 2 2 9 2 , 0 . 4 6 0 5 1 1 , 0 . 7 6 4 1 0 5 , 1 . 4 3 2 7 9 6 " ,\" 0 . 2 3 2 9 5 2 , 0 . 2 4 6 7 8 2 , 0 . 2 7 5 8 4 7 , 0 . 3 3 9 0 0 7 , 0 . 4 7 7 2 2 5 , 0 . 7 8 0 8 2 1 , 1 . 4 4 9 5 1 0 " ,\52 " 0 . 2 4 7 3 9 2 , 0 . 2 6 1 2 2 2 , 0 . 2 9 0 2 8 8 , 0 . 3 5 3 4 4 7 , 0 . 4 9 1 6 6 6 , 0 . 7 9 5 2 5 9 , 1 . 4 6 3 9 5 2 " ,\" 0 . 2 6 7 1 9 0 , 0 . 2 8 1 0 2 0 , 0 . 3 1 0 0 8 4 , 0 . 3 7 3 2 4 5 , 0 . 5 1 1 4 6 5 , 0 . 8 1 5 0 5 2 , 1 . 4 8 3 7 4 7 " ,\54 " 0 . 2 9 1 8 0 2 , 0 . 3 0 5 6 3 1 , 0 . 3 3 4 6 9 5 , 0 . 3 9 7 8 5 1 , 0 . 5 3 6 0 7 0 , 0 . 8 3 9 6 7 4 , 1 . 5 0 8 3 6 0 " ,\" 0 . 3 1 6 9 1 4 , 0 . 3 3 0 7 4 3 , 0 . 3 5 9 8 0 4 , 0 . 4 2 2 9 5 6 , 0 . 5 6 1 1 8 1 , 0 . 8 6 4 7 8 0 , 1 . 5 3 3 4 7 5 " ,\56 " 0 . 3 3 7 9 7 7 , 0 . 3 5 1 8 1 8 , 0 . 3 8 0 8 7 0 , 0 . 4 4 4 0 1 2 , 0 . 5 8 2 2 3 5 , 0 . 8 8 5 8 3 8 , 1 . 5 5 4 5 4 6 " ) ;}58 r i s e _ t r a n s i t i o n (DELAY_7x7) {index_1 ( " 0 . 0 3 6 2 9 8 , 0 . 1 0 3 5 3 0 , 0 . 1 8 6 1 2 2 , 0 . 3 5 2 3 9 9 , 0 . 6 8 5 8 5 5 , 1 . 2 9 7 6 5 1 , 2 . 6 8 8 4 3 6 " ) ;60 index_2 ( " 0 . 0 0 1 5 0 0 , 0 . 0 0 3 3 0 6 , 0 . 0 0 7 2 8 7 , 0 . 0 1 6 0 6 2 , 0 . 0 3 5 4 0 4 , 0 . 0 7 8 0 3 5 , 0 . 1 7 2 0 0 0 " ) ;


4.4. Behavioral Model 30v a l u e s ( " 0 . 0 6 1 8 1 1 , 0 . 0 8 8 6 9 4 , 0 . 1 5 0 3 0 9 , 0 . 2 8 9 7 7 2 , 0 . 6 0 1 0 1 3 , 1 . 2 8 8 6 4 7 , 2 . 8 0 4 9 7 9 " ,\62 " 0 . 0 6 1 8 0 6 , 0 . 0 8 8 6 9 0 , 0 . 1 5 0 3 2 3 , 0 . 2 8 9 8 1 6 , 0 . 6 0 1 0 1 7 , 1 . 2 8 8 6 3 0 , 2 . 8 0 4 9 7 9 " ,\" 0 . 0 6 1 8 1 2 , 0 . 0 8 8 6 8 5 , 0 . 1 5 0 3 3 0 , 0 . 2 8 9 8 6 5 , 0 . 6 0 1 0 0 2 , 1 . 2 8 8 6 2 8 , 2 . 8 0 4 9 7 3 " ,\64 " 0 . 0 6 1 8 1 2 , 0 . 0 8 8 6 9 1 , 0 . 1 5 0 3 0 3 , 0 . 2 8 9 8 6 9 , 0 . 6 0 0 9 7 1 , 1 . 2 8 8 6 3 7 , 2 . 8 0 4 9 7 8 " ,\" 0 . 0 6 1 8 0 3 , 0 . 0 8 8 6 8 0 , 0 . 1 5 0 3 2 8 , 0 . 2 8 9 8 5 3 , 0 . 6 0 1 0 6 1 , 1 . 2 8 8 6 3 9 , 2 . 8 0 4 9 7 9 " ,\66 " 0 . 0 6 1 8 3 8 , 0 . 0 8 8 7 1 5 , 0 . 1 5 0 3 4 2 , 0 . 2 8 9 9 1 3 , 0 . 6 0 1 0 1 2 , 1 . 2 8 8 6 3 8 , 2 . 8 0 4 9 7 9 " ,\" 0 . 0 6 2 1 1 3 , 0 . 0 8 8 9 3 3 , 0 . 1 5 0 4 5 2 , 0 . 2 8 9 9 2 4 , 0 . 6 0 0 9 9 2 , 1 . 2 8 8 6 4 4 , 2 . 8 0 4 9 8 3 " ) ;68 }c e l l _ f a l l (DELAY_7x7) {70 index_1 ( " 0 . 0 3 6 2 9 8 , 0 . 1 0 3 5 3 0 , 0 . 1 8 6 1 2 2 , 0 . 3 5 2 3 9 9 , 0 . 6 8 5 8 5 7 , 1 . 2 9 7 6 4 0 , 2 . 6 8 8 4 3 5 " ) ;index_2 ( " 0 . 0 0 1 5 0 0 , 0 . 0 0 3 3 0 6 , 0 . 0 0 7 2 8 7 , 0 . 0 1 6 0 6 2 , 0 . 0 3 5 4 0 4 , 0 . 0 7 8 0 3 5 , 0 . 1 7 2 0 0 0 " ) ;72 v a l u e s ( " 0 . 2 2 5 2 3 0 , 0 . 2 3 4 6 9 1 , 0 . 2 5 1 9 8 3 , 0 . 2 8 4 2 3 6 , 0 . 3 4 7 7 8 6 , 0 . 4 8 2 7 8 4 , 0 . 7 7 9 5 3 1 " ,\" 0 . 2 4 2 2 0 3 , 0 . 2 5 1 6 6 5 , 0 . 2 6 8 9 5 6 , 0 . 3 0 1 2 0 9 , 0 . 3 6 4 7 5 9 , 0 . 4 9 9 7 5 5 , 0 . 7 9 6 5 0 1 " ,\74 " 0 . 2 5 6 9 3 9 , 0 . 2 6 6 3 9 9 , 0 . 2 8 3 6 9 2 , 0 . 3 1 5 9 4 4 , 0 . 3 7 9 4 9 4 , 0 . 5 1 4 4 9 1 , 0 . 8 1 1 2 3 7 " ,\" 0 . 2 7 7 2 0 3 , 0 . 2 8 6 6 6 3 , 0 . 3 0 3 9 5 5 , 0 . 3 3 6 2 0 7 , 0 . 3 9 9 7 5 7 , 0 . 5 3 4 7 5 6 , 0 . 8 3 1 5 0 1 " ,\76 " 0 . 3 0 2 3 8 6 , 0 . 3 1 1 8 4 7 , 0 . 3 2 9 1 3 8 , 0 . 3 6 1 3 9 0 , 0 . 4 2 4 9 4 0 , 0 . 5 5 9 9 3 8 , 0 . 8 5 6 6 8 6 " ,\" 0 . 3 2 7 6 7 8 , 0 . 3 3 7 1 3 8 , 0 . 3 5 4 4 2 9 , 0 . 3 8 6 6 8 1 , 0 . 4 5 0 2 3 0 , 0 . 5 8 5 2 2 8 , 0 . 8 8 1 9 7 6 " ,\78 " 0 . 3 4 6 8 9 4 , 0 . 3 5 6 3 5 3 , 0 . 3 7 3 6 4 3 , 0 . 4 0 5 8 9 3 , 0 . 4 6 9 4 4 2 , 0 . 6 0 4 4 4 0 , 0 . 9 0 1 1 8 5 " ) ;}80 f a l l _ t r a n s i t i o n (DELAY_7x7) {index_1 ( " 0 . 0 3 6 2 9 8 , 0 . 1 0 3 5 3 0 , 0 . 1 8 6 1 2 2 , 0 . 3 5 2 3 9 9 , 0 . 6 8 5 8 5 7 , 1 . 2 9 7 6 4 0 , 2 . 6 8 8 4 3 5 " ) ;82 index_2 ( " 0 . 0 0 1 5 0 0 , 0 . 0 0 3 3 0 6 , 0 . 0 0 7 2 8 7 , 0 . 0 1 6 0 6 2 , 0 . 0 3 5 4 0 4 , 0 . 0 7 8 0 3 5 , 0 . 1 7 2 0 0 0 " ) ;v a l u e s ( " 0 . 0 3 9 3 4 4 , 0 . 0 5 1 3 7 7 , 0 . 0 7 6 1 2 1 , 0 . 1 2 8 4 5 3 , 0 . 2 4 4 8 0 8 , 0 . 5 0 7 6 0 3 , 1 . 0 9 5 4 2 4 " ,\84 " 0 . 0 3 9 3 4 1 , 0 . 0 5 1 3 8 0 , 0 . 0 7 6 1 1 4 , 0 . 1 2 8 4 4 4 , 0 . 2 4 4 8 1 0 , 0 . 5 0 7 6 1 7 , 1 . 0 9 5 4 1 6 " ,\" 0 . 0 3 9 3 3 9 , 0 . 0 5 1 3 7 8 , 0 . 0 7 6 1 1 7 , 0 . 1 2 8 4 4 9 , 0 . 2 4 4 8 1 4 , 0 . 5 0 7 6 1 9 , 1 . 0 9 5 4 2 2 " ,\86 " 0 . 0 3 9 3 3 9 , 0 . 0 5 1 3 7 3 , 0 . 0 7 6 1 1 4 , 0 . 1 2 8 4 4 6 , 0 . 2 4 4 7 9 0 , 0 . 5 0 7 5 8 1 , 1 . 0 9 5 4 3 2 " ,\" 0 . 0 3 9 3 3 6 , 0 . 0 5 1 3 7 1 , 0 . 0 7 6 1 1 6 , 0 . 1 2 8 4 5 0 , 0 . 2 4 4 8 1 1 , 0 . 5 0 7 6 3 1 , 1 . 0 9 5 4 3 3 " ,\88 " 0 . 0 3 9 3 3 3 , 0 . 0 5 1 3 7 0 , 0 . 0 7 6 1 0 9 , 0 . 1 2 8 4 3 8 , 0 . 2 4 4 7 9 3 , 0 . 5 0 7 5 8 4 , 1 . 0 9 5 4 2 5 " ,\" 0 . 0 3 9 3 2 8 , 0 . 0 5 1 3 6 6 , 0 . 0 7 6 1 0 3 , 0 . 1 2 8 4 2 7 , 0 . 2 4 4 7 4 9 , 0 . 5 0 7 5 5 1 , 1 . 0 9 5 4 2 1 " ) ;90 }}92 }pin (D) {94 nextstate_type : data ;d i r e c t i o n : input ;96 c a p a c i t a n c e : 0 . 0 0 1 5 4 9 ;internal_power ( ) {98 when : " !CK" ;power (POWER_7x1) {100 index_1 ( " 0 . 0 3 4 8 4 5 , 0 . 1 0 2 8 9 3 , 0 . 1 8 5 7 6 6 , 0 . 3 5 2 4 4 7 , 0 . 6 8 6 6 4 2 , 1 . 2 9 9 8 2 1 , 2 . 6 9 3 7 0 5 " ) ;v a l u e s ( " 0 . 0 0 7 6 9 7 , 0 . 0 0 7 6 9 0 , 0 . 0 0 7 7 3 8 , 0 . 0 0 7 9 0 9 , 0 . 0 0 8 3 8 3 , 0 . 0 0 9 4 4 4 , 0 . 0 1 2 2 3 2 " ) ;102 }}104 internal_power ( ) {when : "CK" ;106 power (POWER_7x1) {index_1 ( " 0 . 0 3 4 8 4 5 , 0 . 1 0 2 8 9 4 , 0 . 1 8 5 7 6 2 , 0 . 3 5 2 5 1 2 , 0 . 6 8 6 6 3 1 , 1 . 2 9 9 8 1 8 , 2 . 6 9 3 7 0 3 " ) ;108 v a l u e s ( " 0 . 0 0 2 4 9 9 , 0 . 0 0 2 4 8 7 , 0 . 0 0 2 5 3 9 , 0 . 0 0 2 7 2 0 , 0 . 0 0 3 2 1 8 , 0 . 0 0 4 3 1 2 , 0 . 0 0 7 0 8 7 " ) ;}110 }timing ( ) {112 r e l a t e d _ p i n : "CK" ;sdf_edges : both_edges ;


31 4.4. Behavioral Model114 timing_type : s e t u p _ r i s i n g ;r i s e _ c o n s t r a i n t (CONST_3x3) {116 index_1 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;index_2 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;118 v a l u e s ( " 0 . 1 1 7 5 5 0 , 0 . 0 8 3 7 2 1 , 0 . 1 1 9 8 3 0 " ,\" 0 . 2 7 3 5 9 0 , 0 . 2 3 2 3 6 0 , 0 . 2 7 3 4 0 0 " ,\120 " 0 . 3 2 9 4 2 0 , 0 . 2 8 5 7 3 0 , 0 . 3 2 9 2 3 0 " ) ;}122 f a l l _ c o n s t r a i n t (CONST_3x3) {index_1 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;124 index_2 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;v a l u e s ( " 0 . 1 3 4 8 1 0 , 0 . 0 6 1 5 2 9 , 0 . 0 7 0 5 1 3 " ,\126 " 0 . 3 7 7 1 5 0 , 0 . 2 9 6 4 7 0 , 0 . 3 0 4 2 2 0 " ,\" 0 . 5 0 9 4 3 0 , 0 . 4 3 3 6 8 0 , 0 . 4 4 2 6 6 0 " ) ;128 }}130 timing ( ) {r e l a t e d _ p i n : "CK" ;132 sdf_edges : both_edges ;timing_type : h o l d _ r i s i n g ;134 r i s e _ c o n s t r a i n t (CONST_3x3) {index_1 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;136 index_2 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;v a l u e s ( " −0.039579 , −0.020541 , −0.051718 " ,\138 " −0.161090 , −0.151920 , −0.185560 " ,\" −0.172540 , −0.188030 , −0.226600 " ) ;140 }f a l l _ c o n s t r a i n t (CONST_3x3) {142 index_1 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;index_2 ( " 0 . 0 3 3 0 0 0 , 1 . 3 4 0 0 0 0 , 2 . 6 8 0 0 0 0 " ) ;144 v a l u e s ( " −0.029716 ,0.041104 ,0.037051 " ,\" −0.200550 , −0.146990 , −0.147340 " ,\146 " −0.268710 , −0.239810 , −0.246330 " ) ;}148 }}150 pin (CK) {d i r e c t i o n : input ;152 c a p a c i t a n c e : 0 . 0 0 1 6 2 7 ;max_transition : 2 . 6 8 0 0 0 0 ;154 clock : t r u e ;internal_power ( ) {156 power (POWER_7x1) {index_1 ( " 0 . 0 3 4 8 4 7 , 0 . 1 0 2 8 9 3 , 0 . 1 8 5 7 6 9 , 0 . 3 5 2 4 6 9 , 0 . 6 8 6 6 4 0 , 1 . 2 9 9 8 1 9 , 2 . 6 9 3 7 0 7 " ) ;158 v a l u e s ( " 0 . 0 0 5 7 7 6 , 0 . 0 0 5 5 6 1 , 0 . 0 0 5 4 9 6 , 0 . 0 0 5 6 6 4 , 0 . 0 0 5 9 9 2 , 0 . 0 0 7 0 5 6 , 0 . 0 0 9 9 7 6 " ) ;}160 }min_pulse_width_high : 0 . 1 8 4 8 0 0 ;162 min_pulse_width_low : 0 . 3 2 5 2 0 0 ;}164 }


4.4. Behavioral Model 32


Chapter 5Simulations and ResultsIn this work, CORDIC algorithm is implemented and demonstrated, in simulations, tobe operable over the wide voltage range <strong>of</strong> 270mV to 1.2V in all process corners. Thischapter talks about flow <strong>of</strong> work done and results obtained.Standard cell library is always designed and characterized at a particular voltage. 0.13µmFaraday standard core cell library is characterized at 1.2volts in TT corner with 25℃.Thereby we can calculate the power and delay <strong>of</strong> the ASIC synthesized through thisstandard cell library at that particular voltage. But we can not guarantee that thesame ASIC can operate at some other voltage level with different performance. Wecan only guarantee the ASIC to be operable at different voltage level meeting a specificperformance by doing exhaustive Spice simulation. Spice simulation <strong>of</strong> ASIC design is nota feasible solution because it takes several day to finish. In this work we have proposeda simpler approach to determine the same for CORDIC algorithm and also we havecalculated energy consumed per rotation for a wide operating voltage range <strong>of</strong> 270mV to1.2V. Thus we find out the minimum energy point <strong>of</strong> CORDIC design.5.1 Assumptions• It is assumed that the critical path <strong>of</strong> ASIC design does not change with voltage,process and temperature.• It is assumed that the leakage power <strong>of</strong> cell is independent <strong>of</strong> the input combination.The first assumption is a weak assumption because with PVT variation delay <strong>of</strong> each path33


5.2. Process Flow 34changes and thus critical path may change. The second assumption is taken becausewe assume that the all the input combinations will arrive to the cell for equal time.Thereby we take the leakage power for each cell (average <strong>of</strong> leakage power <strong>of</strong> all inputcombinations). So in this work we will stick with these assumptions.5.2 Process FlowThe full process flow <strong>of</strong> the work done is shown in Figure 5.1 and explained below.Figure 5.1: Process FlowThe process flow is defined in following steps:• The RTL code <strong>of</strong> CORDIC algorithm is first simulated and functionality is verifiedby behavioral simulation.• From the behavioral simulation an activity file is generated to calculate power <strong>of</strong> thewhole circuit using Prime power. Prime Power also need the parasitic informationthat was given in the form <strong>of</strong> .spef file format after placement and routing usingSoC Encounter.


35 5.2. Process Flow• To place and route, First standard cell library was modified as discussed in sectionsec:librarymodification and using this library, we synthesized our RTL code <strong>of</strong>CORDIC algorithm at 200MHz clock speed.• Synthesis generates a gate level netlist, which gives us different cells used in thecomplete circuit and also number <strong>of</strong> occurrence <strong>of</strong> each cell. Using this information,average leakage power <strong>of</strong> each cell is calculated by simulating them individually incadence.• Total leakage power is calculated by adding average leakage power <strong>of</strong> all the cellsfor different supply voltage.• Timing analysis is done using Design compiler which gives us the critical path andassociated node capacitances in the critical path.• Extracted node capacitances does not include the wire capacitance. Since in subthresholdoperation wire load capacitance also play significant role, parasitic informationis extracted from the placed and routed SoC (using SoC Encounter) in theform <strong>of</strong> .spef file. This information is extracted for each node associated with thecritical path and added this parasitic capacitance to respective node.• This path is analyzed in cadence and total path delay is calculated for differentsupply voltage.• Now we have the total leakage power and critical path delay <strong>of</strong> circuit at supplyvoltage from 270mV to 1.2V. Following equations are used to calculate the powerand energy at all these voltage levels:T otalP ower(P T ) = DynamicP ower(P DY N ) + LeakageP ower(P LEAK )(5.1)= αC eff V 2 DDf + P LEAK (5.2)P T 1 = αC eff V 2 DD1f 1 + P LEAK1 (5.3)P T 2 = αC eff V 2 DD2f 2 + P LEAK2 (5.4)P DY N2 = P T 2 − P LEAK2 = V DD2 2 f 2VDD1 2 f .(P T 1 − P LEAK1 ) (5.5)1P T 2 = V DD2 2 f 2VDD1 2 f .(P T 1 − P LEAK1 ) + P LEAK2 (5.6)1LeakageEnergy(E LEAK ) = P LEAK2 .T D2 (5.7)DynamicEnergy(E DY N2 ) = P DY N2 .T D2 (5.8)


5.3. Results 365.3 ResultsCORDIC algorithm is synthesized at 200 MHz clock frequency through pruned library,as discussed in section 3.4, containing clocked inverter flip-flop instead <strong>of</strong> standard transmissiongate flip-flop. The energy consumed per rotation was plotted against the supplyvoltage as shown in figure 5.2.Figure 5.2: Energy characteristics against supply voltage in SS Corner, (For ModifiedLibrary)From the figure 5.2, we can conclude that the minimum energy point is in between 410- 440mV. Although the leakage power reduces but the leakage energy is exponentiallyincreasing as the supply voltage is reducing. This is because the delay <strong>of</strong> the criticalpath increases rapidly with reduction in supply voltage which dominates in sub-thresholdoperation. Dynamic energy is directly proportional to the square <strong>of</strong> supply voltage asdepicted in Figure 5.2.CORDIC design is simulated in SS, FF and tt corners to make sure the operation is preservedthroughout the process variation. The magnitude <strong>of</strong> threshold voltage <strong>of</strong> transistordecreases nearly linearly with temperature and may be approximated by the followingformula [3].V t (T ) = V t (T r ) − k vt (T − T r ) (5.9)where k vt is typically about 1-2 mV/K and T r is room temperature. Also with temperature,ON current <strong>of</strong> transistor decreases because transistors are velocity saturated in


37 5.3. ResultsFigure 5.3: Normalized Delay vs Supply Voltage (for modified library).super-threshold regime. This makes the transistor slower in high temperature at superthresholdvoltages. But in sub-threshold operation, i.e. below some voltage, transistorsare not velocity saturated and temperature increases the mobility <strong>of</strong> carriers and Ionincreases with temperature. Thus in sub-threshold operation devices becomes faster withtemperature. This is depicted in figure 5.3.Figure 5.4: Energy characteristics against supply voltage in different process corners (formodified library). (a) Leakage Energy. (b) Total energy.As we know that sub-threshold leakage is exponentially dependent on temperature, so in


5.4. Comparison <strong>of</strong> results with Normal library 38SS corner transistors will leak more as compared to TT and FF corner. This is shown inFigure 5.4.5.4 Comparison <strong>of</strong> results with Normal libraryIn this section, We will compare our results <strong>of</strong> pruned library with normal library, i.e.without any modification.In modified library, we have only limited number <strong>of</strong> gates. Clearly, we have to use morenumber <strong>of</strong> gates to map the exact functionality <strong>of</strong> the circuit as compared to normal (full)library. This increases the area <strong>of</strong> the circuit. Thus after pruning, our area <strong>of</strong> CORDICis increased by 31.5% as compared to CORDIC using normal library. due to this theleakage power increases. which is shown in figure 5.5.Figure 5.5: Energy vs Supply VoltageFor a particular frequency <strong>of</strong> operation, stacking increases the supply voltage demand tocompensate the delay degradation. This makes the CORDIC design with normal libraryslower than the CORDIC design with pruned library and consume less dynamic energy asshown in figure 5.5. So for the same operating frequency, it requires more supply voltageas compared to pruned library as shown in figure 5.6.Leakage power in CORDIC design with pruned library is more as compared to normallibrary because it contains less stacked devices. So for low performance requirements,


39 5.4. Comparison <strong>of</strong> results with Normal libraryFigure 5.6: Supply voltage requirement vs Performancewhere leakage power is more as compared to dynamic power, pruned library consumesmore power as shown in Figure 5.7. From the Figure 5.7, we observe that at 2MHzfrequency normal library is more power efficient than pruned library but at 250MHz, itconsumes more energy.Figure 5.7: Power Consumption vs PerformanceIn the Figure 5.7, we observe that at low performance requirement, In FF corner weconsume 3 times less energy than SS corner. On the other hand at high frequency, at 250MHz the power efficiency in FF corner is 75% more that SS corner.


5.5. The openMSP430 Micro-controller 40The comparison <strong>of</strong> CORDIC designs using normal library and modified library is summarizedin table 5.1.Table 5.1: Comparison CORDIC Designs 1t 1.2V, SS cornerParameter With Normal Library With Modified LibraryGate Count 8560 11260Maximum Frequency (MHz) 245 260Leakage Energy (pJ/rot) 0.189 0.236Dynamic Energy (pJ/rot) 11.57 11.56Total Energy (pJ/rot) 11.76 11.79Minimum Energy (pJ/rot) 420-450 410-440Minimum Energy Voltage (mV) 1.6-1.7 1.9-2.05.5 The openMSP430 Micro-controllerThe openMSP430 micro-controller is also synthesized through this pruned library andcompared against the normal library in table 5.2. From the table 5.2, after pruning,area increases around 30%. Leakage increases because <strong>of</strong> lesser stacking. Speed is a bitbetter. Thus after pruning openMSP430 also consumes more energy per rotation. In thetable 5.2, energy consumption is only for core <strong>of</strong> openMSP430 excluding memory andperipheral devices.Table 5.2: Comparison <strong>of</strong> CORDIC on openMSP430 at 1.08V, SS cornerParameter With Normal Library With Modified LibraryGate Count 10279 13389Maximum Frequency (MHz) 66.6 66.8Leakage power (mW) 0.033 0.046Dynamic power (mW) 0.616 0.637Total power (mW) 0.649 0.683Energy per rotation (nJ/rot) 1.048 1.103


Chapter 6Conclusions And Scope for FutureWork6.1 ConclusionsThis work proposes the implementation <strong>of</strong> ASIC design operable at sub-threshold voltagesso that the minimum energy point is achievable. We have addressed the issue in subthresholdcircuit design and problems with transmission gates and stacked devices. Forthis, we modified the standard cell library and designed the clocked-inverter flip-flop forthe library. We have used the design <strong>of</strong> clocked-inverter flip-flop from [8]. The proposedmethodology was applied for an CORDIC algorithm in rotation mode at clock frequency<strong>of</strong> 200MHz and successfully able to achieve the minimum energy point, in simulation, atSS and TT process corner.From Figure 5.6, we see that the supply voltage requirement <strong>of</strong> pruned library is alwaysless than the normal library. From Figure 5.7, we conclude that for the low performancerequirement, normal library is more energy efficient. Also in FF process corner, design ismuch energy efficient all over the performance range.The minimum energy voltage for the CORDIC design was 410-440mV and the minimumenergy consumption was 1.9-2.0 pJ/rotation. The gate count <strong>of</strong> the CORDIC designusing modified library was 11260 which was approx 32% more than the CORDIC designusing normal library. Thus the CORDIC algorithm was successfully able to operate overa wide voltage range <strong>of</strong> 270mV to 1.2V.41


6.2. Scope for Future Work 426.2 Scope for Future WorkThis work is done in 0.13µm technology node. As per the earlier discussion, the secondordereffects becomes more prominent as the technology shrinks down. So it will beinteresting to see the results in lower technology nodes and can open more doors to thiswork.Our first assumption was that the critical path does not change with PVT variation.Which is a weak assumption and in further work we can find out the way to track thecritical path along with minimum energy point with scaling down the voltage.In this work we assumed that the leakage power is independent <strong>of</strong> input combinationsand, in the calculations, we took the average leakage power <strong>of</strong> all the input combinations.To get the exact leakage power we can track the activity in future.


Appendix APlace And Route using SoC EncounterSoC Encounter is a powerful place and route tool which has first encounter, Nano Route,Celtic and optimization tool as in built. It is widely used for RTL to GDSII flow andcan perform various functions as floor planning, feasibility analysis, placement, clock treesynthesis, power routing, SI (Signal integrity) aware routing and IR drop analysis [16, 13].SoC Encounter flow is shown in Figure A.1.Figure A.1: SoC Encounter Flow43


A.1. Input Requirements <strong>of</strong> SoC Encounter 44A.1 Input Requirements <strong>of</strong> SoC EncounterSoC Encounter requires following files for the RTL to GDSII flow:• Verilog Netlist: The Verilog netlist we obtain from synthesis <strong>of</strong> module to belaid out contains standard cells, functional I/O pads and their inter-connectioninformation. This file should be in synthesized Verilog (.v file) netlist format.• IO file: This file contains information <strong>of</strong> IO pads like which kind <strong>of</strong> (Input, Output,corner, with or without ESD protection etc.) and where (with <strong>of</strong>fset, spacing, directionetc.) is to be placed. This file should be in proper format with .io extension.• Timing Constraint Files: Just as for synthesis, we need to specify timing constraintsfor the backend design with SoC-Encounter. The file can be generated fromsynthesis tool (Design Compiler) and should be in .sdc format.• Technology files: these files describe the technology itself as well as libraries <strong>of</strong>standard building block simple mented in this technology, i.e. standard cells, pads,RAM/ROM.– header.lef: Base technology description, defines metal layers, vias, spacingrules, routing.– Fscoh–.lef: Physical description, shape and allowed orientation <strong>of</strong> cells, layerand shape <strong>of</strong> pins, blockages, antenna information.– *fast*.lib, *slow*.lib: Functional description, timing and power information,maximum load/fanout or transition time allowed.A.2 SoC Encounter Flow• Initialization: Since Encounter creates a lot <strong>of</strong> files, Always create a new workdirectory before initializing it. Change your current directory to that. Then SoCEncounter can be started with Figure A.2encounterDo not add the "&" at the end <strong>of</strong> this command. Encounter uses the terminal tolaunch it and to provide it feedback and results for user’s commands and actions.Commands can also be given in the terminal for almost all functions that can beperformed in GUI.


45 A.2. SoC Encounter FlowFigure A.2: SoC Encounter’s GUI• Importing Design: To import the design with required libraries, goto:Design ⇒ Import Design And Fill the required entries appropriately as shownFigure A.3: Importing Design in SoC Encounterin Figure A.3. Then gotoAdvanced ⇒ powerAnd change the power and groud nets according to the library definition. This canbe found in fscoh.lib file. For example here it is VCC and GND respectively.If everything went properly then initial floorplan and memories become visible. The


A.2. SoC Encounter Flow 46design hierarchy can be viewed by choosingTools ⇒ Design BrowserOne can verify here library files also.• Floorplanning: In this step we can define the dimensions <strong>of</strong> ASIC core, arrangement<strong>of</strong> the core row, distance between ASIC core and IOs, Physical location <strong>of</strong>any hard macro and distance between these blocks and core rows. To specify thesegot<strong>of</strong>loorplan ⇒ Specify FloorplanDuring clock tree synthesis, clock buffers are added into the design and due to optimizationprocess some area left vacate in the design that is to be filled by fillers.So in practice Core Utilization is kept at 0.7 i.e. 30% area is left for clock buffersand fillers. This setting is shown in Figure A.4.Figure A.4: Specifying FloorplanIt is necessary to specify the global net connections i.e. VCC and GND connectionsare to be specified. This is done byFloorplan ⇒ Connect Global NetsSelect Pin and specify VCC under Pin Name, and select Apply All. Specify VCCas Global net and Add to List. Also connect VCC (Under Net Basename) to VCC.Repeat the same procedure for GND net. Also connect Tie High and Tie low toVCC and GND respectively. Figure A.5 shows its setting.• Power Planning: We first add power rings to cover core area with supply andground so that connections can be made easily. For this goto


47 A.2. SoC Encounter FlowFigure A.5: Global Net SpecificationPower ⇒ Power Planning ⇒ Add RingsAnd do the settings as shown in Figure A.6 and click on OK. This will create tworings covering the core. One is <strong>of</strong> GND (inner ring) and other one is <strong>of</strong> VCC (outerring).Figure A.6: Add Core Ring PaneNow we create a mesh or stripes (as per requirement) to minimize the voltage sag.This can be done by adding stipes (horizontal or vertical stripes) or mesh (horizontaland vertical stripes). For this gotoPower ⇒ Power Planning ⇒ Add Stripes


A.2. SoC Encounter Flow 48Do the settings according to the design (as shown in Figure A.7 for our example)and click OK.Figure A.7: Add Stripe Ring PaneNow we can supply power to each cell <strong>of</strong> the design. This can be done usingSROUTE, Figure A.8. This creates small horizontal power and ground lines asper the size <strong>of</strong> standard cells defined in library file.Figure A.8: SROUTE Pane


49 A.2. SoC Encounter FlowAfter this step you can see some blue horizontal lines in the core. Also If the designis with IO pads then one can see the power ring to IO connection as shown in FigureA.9.Figure A.9: A view <strong>of</strong> Core after SROUTE• Standard Cell Placement: The layout is now ready for standard cell placementand this can be influenced by the information in the constraint file. The standardsetting in the Placement Mode Panel, As selected in Figure A.10, should be sufficientto achieve satisfying placement result. To start placement with the provided settingschoosePlace ⇒ Standard CellsHit OK with the default setting.• Clock Tree Synthesis: SoC Encounter generates a clock tree by mapping therequirements in the clock specification file (.cts) and constraint file (.scf) to thephysical facts. The clock tree is assembled by appropriate sized clock buffers thatwill be accommodated in the core row gaps. To synthesize clock tree chooseClock ⇒ Design ClockSelect the .cts file or generate by selecting the required cells. Then click OK.• Signal Routing: After all blocks and cells are placed and the clock tree is routedthe cells on the core rows need to be connected as specified in the netlist. This is


A.2. SoC Encounter Flow 50Figure A.10: Placement Mode Setting Paneaccomplished by a routing tool named Nanoroute which is incorporated into SoCEncounter. SelectRoute ⇒ Nanoroute ⇒ Routeto open the nanoroute pane as shown in Figure A.11.Figure A.11: Signal Routing PaneAfter this step gotoVerify ⇒ Verify Geometryand verify for any violation in Tools ⇒ Violation Browser. If there is anyviolation then change your floorplan and do the whole procedure again.• Add Filler cell: The last step <strong>of</strong> placement and routing is to fill the rest area withfiller cells. To do this choosePlace ⇒ Physical Cell ⇒ Add FillerAdd all the filler cells from the list and click on OK. This makes the core utilization


51 A.2. SoC Encounter Flowequals to 1 because now filler cells will also be counted as the cells that fills corearea.At last the Core should look like Figure A.12.Figure A.12: Final View <strong>of</strong> Core after Placement and Routing


A.2. SoC Encounter Flow 52


Appendix BLEF File Generation Using AbstractGeneratorThe abstract generator program abstract is used for generating LEF file <strong>of</strong> clocked inverterFlip-flop without technology description part. This is due to the difference in layerdefinition in UMC library and faraday standard cell library. So the technology descriptionpart must be created manually.A LEF file describing a library has two parts:1. The technology description part:• The layers available in the technology. Only layers involved in PNR (Placeand route) should be included.• Part <strong>of</strong> design rules which affect PNR operation.• Library designer-defined routing rules such as preferred direction <strong>of</strong> metaltrack, chosen value <strong>of</strong> routing pitch etc.2. The cell description part, describing the geometries comprising each cell:• The shape and size <strong>of</strong> cell.• The location <strong>of</strong> pins and the layer those pins sits on and its geometric description.• Detailed description <strong>of</strong> layers which do not belong to any particular pins butprohibit the passage <strong>of</strong> routing tracks in the same layer.53


54The first part can be generated by either manually or by using some tools, like technologyfile editor (as in our case). It is required by abstract as an input for generation <strong>of</strong> thesecond part.It is assumed that we have UMC technology file using which we have made the circuitand layout.1. Create an new library in cadence, named as, For example, std_tech.2. In CIW (Command Interpreter Window) choose Tools -> Technology File ManagerSelect Load and browse for the UMC technology file to load into the librarystd_tech.3. Now again open Technology File Manager and Double Click on PR. Do thesetting as shown in figure B.1 in Routing Layers subclass.Figure B.1: Defining routing layers4. In Via types subclass do the setting as shown in figure B.2.Figure B.2: Defining VIA


55Now save the technology library. Assign this library to your design library.To generate abstract view, we need to go through three steps:1. Pins: Defines Pins.2. Extract: Extracts the pins and obstruction information.3. Abstract: Generate the abstract view <strong>of</strong> layout.Now open the layout and goto Tools ⇒ Abstract Generator.• Step 1 (Defining Pins):1. Select Flow ⇒ Pins. In new window, goto Map tab and specify the layersin which labels for pins are drawn as shown in figure B.3.Figure B.3: Defining Pin Labels2. In Boundary tab, check whether any boundary adjustment is needed. SetCreate Boundary as as needed.


563. Press RUN button and wait until pin generation step is completed. There maybe some warnings related to the parts <strong>of</strong> the cells being outside the boundary.They could be just ignored.• Step 2 (Pin and Obstruction Extraction):Choose Flow ⇒ Extract. In Power tab, check Extract power nets. ChooseRUN.• Step 3 (Abstract Generation):1. Choose Flow ⇒ Abstract. In Blockage tab <strong>of</strong> new window, set the Blockagetype for all layer as Detailed.2. In Overlap tab, set Create Overlap Boundary as as needed.3. press Run.Now to Export LEF file <strong>of</strong> the cell, In Abstract Generator window, select File ⇒ ExportLEF and Press OK.


Bibliography[1] C. G. B. Garrett and W. H. Brattain, "Physical Theory <strong>of</strong> Semiconductor Surface,"Physical Review, vol. 99, p. 376, 1955.[2] Y. Tsividis, Operation and modelling <strong>of</strong> MOS transistors, 2nd ed. New York:McGraw-Hill, 1999.[3] Neil H. E. Weste and D. M. Harris, CMOS VLSI Design - A Circuit and SystemPerspective, 4th ed. Pearson, 2011.[4] CALHOUN et al., "Modeling and Sizing for Minimum Energy Operation in SubthresholdCircuits", IEEE Journal <strong>of</strong> Solid-State Circuits, vol. 40, No. 9, Sep. 2005.[5] Ray Andraka, "A survey <strong>of</strong> CORDIC algorithms for FPGA based computers", Proceedings<strong>of</strong> the 1998 ACM/SIGDA sixth international symposium on Field programmablegate arrays, Feb. 22-24, 1998.[6] E. vittoz, "Weak Inversion For Ultimate Low Power Logic," in Low-Power ElectronicsDesign, C. Piguet, Ed. CRC Press, 2005.[7] Massimo Alioto, "Ultra-Low power VLSI Circuit Design Demystified and Explained:A Tutorial", IEEE Transactions on Circuits and systems, July, 2010.[8] Shailendra Jain, Surhud Khare, "A 280mV-to-1.2V Wide operating Range IA-32 Processor in 32nm CMOS," IEEE international Solid-state circuit conference,ISSCC, 2012.[9] S. Roundy, P. Wright, and J. Rabaey, "A Study <strong>of</strong> Low level Vibrations as a powerSource for Wireless Sensor nNdes," Computer Communications, vol. 26, no.11, pp.1131-1144, 2003.[10] Synopsys Inc., Library Compiler User Guide57


BIBLIOGRAPHY 58[11] FARADAY cell library, FSCOH_D 0.13µm Standard Cell, Databook, ver. 1.1, 2004.[12] Cadence Inc., Abstract Generator User Guide[13] Virginia University. SoC Encounter Tutorial.http://www.ee.virginia.edu/ mrs8n/soc/enc_tutorial.html[14] Portland State University. Creating LEF File tutorial, Jan. 2004.[15] Cadence Inc., OCEAN Reference, ver. 5.1.41, June 2004.[16] Cadence Inc., http://www.cadence.com.[17] Wikipedia, Flip-flop (electronics),http://en.wikipedia.org/wiki/Flip-flop_(electronics)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!