12.07.2015 Views

6. Minimizing Delay

6. Minimizing Delay

6. Minimizing Delay

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Minimizing</strong> <strong>Delay</strong>


Logical Effort DefinitionLogical Effort: Ratio of logic gate’s input capacitance tothat of a inverter sized to have the samedrive strength.


INV X1Cin 3.6fF 4.2fFNAND2 X1Intrinsic rise 25.3ps 31.3psIntrinsic fall 14.6ps 19.5psAverage Intrinsicdelay (t p)20ps25.4psRise delay slope 4.52ps/fF 4.53ps/fFFall delay slope 2.37ps/fF 2.84ps/fFAverage slope(t g h)3.45ps/fF3.68ps/fFτ ? ?p ? ?g 1 ?<strong>Delay</strong> eqn ? ?


Logical Effort from previous tablet inv = p inv Cout /3.6ff t n2 = p n2 g N2 Cout / 4.2ff Inverter' s average slope gives∗1ff /3.6ff =3.45pS=12.42pSInverter ' s averageintrinsic delay gives∗p inv=20pSp inv =1.61Nand2 ' s average slope gives∗g N2 ∗1ff /4.2fF=3.68g n2 =1.25Logical effort of NAND2 is less than 4/3 (1.33), why?


Impact of velocity saturationI dsat=W C oxv satV GS−V TCurrent is independent of length for a completelyvelocity saturated transistorSince a series stack of 'n' is equivalent to longer channellength of 'nL'It has the same pull down current as a single transistor ofminimum channel length.So dont need to oversize the transistors in series stackand hence logical effort is 1.0 for this extreme case.


Sizing to minimize delay10g 1= 1h 1= x/10xg 2= 5/3h 2= y/xyg 3= 4/3h 3= z/yzg 4= 1h 4= 20/z20Capacitance unit = size of unit transistorFind scale of each gate: Relative transistor sizes is fixed and hence logicaleffort is fixed.


Sizing to minimize delayNote:10g 1= 1h 1= x/10Capacitance unit= size of unitTransistor4∏i=1xg 2= 5/3h 2= y/xyg 3= 4/3h 3= z/yProblem : min h1, .. h 4D=∑4D=∑i=14D=∑i =1f i p ig i∗h i p i41D if i=∏ g ih i=∏ g i ∏ h i=G HG=∏ g i= path logical effortzg 4= 1h 4= 20/zH=∏ h i= C 2C inC 3C 2... C outC i −1= C outC in= path electrical effort20Independent of hi


Multistage Logic Networks10g 1= 1h 1= x/10xg 2= 5/3h 2= y/xyg 3= 4/3h 3= z/yzg 4= 1h 4= 20/z20Note:Capacitance unit= size of unitTransistorProblem : min f 1,.. f 4D=∑minimize D=∑4i=14With constraint∏i=141f i p iD if i =G HSolution : f opt = f 1 =...= f 4f opt =GH 1/4g 1C 2C in=g 2C 3C 2=g 3C 4C 3=g 4C outC 4= f opt


Multistage Logic Networks10g 1= 1h 1= x/10xg 2= 5/3h 2= y/xyg 3= 4/3h 3= z/yzg 4= 1h 4= 20/z20Note:Capacitance unit= size of unitTransistor4∏i=1Problem: min f 1,.. f 4D=∑4D=∑i=1f i p i41D if i=G H =1∗ 5 3 ∗4 3 ∗1∗20 10 = 409f opt= f 1=...= f 4= 40 9 1/ 4=1.451 x10 = 5 3yx = 4 3zy =1 20 z =1.45z=13.8 y=12.7 x=14.6


Transistor sizing within the gatez=13.8 y=12.7 x=14.6z pmos= 2 3 ∗13.8=9.2 ;z nmos =1 3 ∗13.8=4.6y pmos= 2 4 ∗12.7=<strong>6.</strong>35=y nmosx pmos= 4 5 ∗14.6=11.7 ; x nmos = 1 5 ∗14.6=2.9


Treat a path as a large compound gate Logical effort generalizes to multistage paths Path Logical EffortG=∏ g i Path Electrical EffortH=Cout−pathCin−path Path EffortF=∑ f i=∑ g ih i However we cannot write F = GH


Branching Effort Introduce branching effort– Accounts for branching between stages in path Now we compute the path effort– F = GBHb= C onpathC offpathC onpathB=∏ b iH = C pathloadC pathinput∏ h i =B H


Paths that Branch515901590D 1 = 1515 1 D52 = 9015 1h i= C C offpathi1 i1= C i 1C iC i1 C offpathi 1= C i1C i 1C ib i∏ h i= C 2C inb 1∗ C 3C 2b 2∗...∗ C outC nb n=∏ b i∗ C outC in=BHF =GBH = path effort


Optimum stage effortNmin D=∑ f i p i1 optimum at f 1=f 2=.. f N=f optf i=g i∗h N i=GBHN∏1h i= C i 1C if i=GBHf optND=Nf ∑ opt1f opt=GBH 1 / Ni iC in =C outx g if optMinimum delay occurs when all stage efforts in the path areequalp i


Finding the gate sizes for the optimized path Work backwards, apply capacitance transformation tofind input capacitance of each gate given load it drives. Check work by verifying input cap spec is met.


ExampleFind x and y to minimize delay


Example: 3-stage pathxxy45A8xyB45Logical Effort G = (4/3)*(5/3)*(5/3) = 100/27Electrical Effort H = 45/8Branching Effort B = 3 * 2 = 6Path Effort F = GBH = 125Best Stage Effortf opt =F 1/3 =5Parasitic <strong>Delay</strong> P = 2 + 3 + 2 = 7<strong>Delay</strong>D = 3*5 + 7 = 22 = 4.4 FO4


Example: 3-stage path Work backward for sizesy =x =xxy45A8xyB45


Best Number of Stages How many stages should a path use?– <strong>Minimizing</strong> number of stages is not always fastest Example: drive 64-bit datapath with unit inverterInitial Driver1 1 1 1D = NF 1/N + P= N(64) 1/N + N8 42.816 823Datapath Load64 64 64 64N:f:D:1646528183415Fastest42.815.3


Optimum Stage effort with freedom to choose number of stagesLogic Block:n 1StagesPath Effort FN - n 1Extra InvertersD N = NF1 n N∑11p i N −n 1 p inv1 1dDdN =F N NFN ln F ∗ −1N p 2 inv1let f =FN =stage effortdDdN = f ∗1−ln f p invfor minima , dD =0: hence ,dNf opt∗1−ln f opt p inv=0f opt=e if p inv=0f opt=3.5 assuming p inv=1N opt=log f F= ln F ln f


Sensitivity Analysis How sensitive is delay to using exactly the best numberof stages?D(N) /D(N)1.61.511.41.261.21.151.0(ρ=6)(ρ =2.4)0.00.5 0.7 1.0 1.4 2.0 2.4 < f < 6 gives delay within 15% of optimal– f = 4 is convenientN / N


Example: DecoderA[3:0] A[3:0]32 bits Decoder specifications:– 16 word register file– Each word is 32 bits wide16Register File– Each bit presents load of 3 unit-sized transistors– True and complementary address inputs A[3:0]– Each input may drive 10 unit-sized transistors We need to decide:– How many stages to use?– How large should each gate be?– How fast can decoder operate?4:16 Decoder16 words


Decoder with 4-input NANDA[3] A[3] A[2] A[2] A[1] A[1] A[0] A[0]10 10 10 10 10 10 10 10yzword[0]96 units of wordline capacitanceyzword[15]


Gate Sizes & <strong>Delay</strong>Logical Effort: G =Path Effort: F =Stage Effort:Path <strong>Delay</strong>:ˆf =D =Gate sizes: z = y =A[3] A[3] A[2] A[2] A[1] A[1] A[0] A[0]10 10 10 10 10 10 10 10yzword[0]96 units of wordline capacitanceyzword[15]


Gate Sizes & <strong>Delay</strong>Logical Effort: G = 1 * 6/3 * 1 = 2Path Effort: F = GBH = 154Stage Effort:Path <strong>Delay</strong>:ˆ 1/3= F = 5.36fD= 3 fˆ+ 1+ 4 + 1 = 22.1Gate sizes: z = 96*1/5.36 = 18 y = 18*2/5.36 = <strong>6.</strong>7A[3] A[3] A[2] A[2] A[1] A[1] A[0] A[0]10 10 10 10 10 10 10 10yzword[0]96 units of wordline capacitanceyzword[15]


Comparison of different gate choices for same decoder Compare many alternatives with a spreadsheet (or writea script)DesignNGPDNAND4-INV22529.8NAND2-NOR2220/9430.1INV-NAND4-INV32622.1NAND4-INV-INV-INV42721.1NAND2-NOR2-INV-INV420/9620.5NAND2-INV-NAND2-INV416/9619.7INV-NAND2-INV-NAND2-INV516/9720.4NAND2-INV-NAND2-INV-INV-INV616/9821.6


Method of Logical Effort•Compute path effort F=GBH•Estimate best number of stagesif number of stages in path is less than of equal toabove, add inverters (even numbers). N' is new pathlength•Determine best stage effort••Work backwards from output to•input to find the gate sizesNf opt=F=1N 'log 4F•


Summary Logical effort is useful for thinking of delay in circuits– Numeric logical effort characterizes gates– NANDs are faster than NORs in CMOS– Paths are fastest when effort delays are ~4– Path delay is weakly sensitive to stages, sizes– But using fewer stages doesn’t mean faster paths– <strong>Delay</strong> of path is about log 4F FO4 inverter delays– Inverters and NAND2 best for driving large caps


Limits of Logical Effort Simplistic delay model– Neglects input rise time effects– Neglects input arrival times of other inputs at eachgate– Gate source capacitance is assumed same for allinputs Interconnect– Iteration required in designs with wire Maximum speed only– Not minimum area/power for constrained delay

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!