Logic Synthesis slides - UCSD VLSI CAD Laboratory

ECE260B – CSE241AWinter 2004**Logic** **Synthesis**(Dr. Cho Moon, guest lecturer)Website: http://vlsicad.ucsd.edu/courses/ece260b-w04ECE 260B – CSE 241A Static Timing Analysis 1Andrew B. Kahng, **UCSD**

Outline• Introduction• Two-level **Logic** **Synthesis**• Multi-level **Logic** **Synthesis**ECE 260B – CSE 241A Static Timing Analysis 2Andrew B. Kahng, **UCSD**

Introduction• Cho Moon• PhD from UC Berkeley 92• Lattice Semiconductor (synthesis) 92 - 96• Cadence Design Systems (synthesis, verification and timinganalysis) 96 – present• Why logic synthesis?• Ubiquitous – used almost everywhere **VLSI** is done• Body of useful and general techniques – same solutions can beused for different problems• Foundation for many applications such as- Formal verification- ATPG- Timing analysis- Sequential optimizationECE 260B – CSE 241A Static Timing Analysis 3Andrew B. Kahng, **UCSD**

RTL Design FlowHDLRTL**Synthesis**ManualDesignModuleGeneratorsLibrarynetlist**Logic****Synthesis**ab01sdclkqnetlistab01dqPhysical**Synthesis**sclklayoutECE 260B – CSE 241A Static Timing Analysis 4Slide courtesy of Devadas, et. alAndrew B. Kahng, **UCSD**

**Logic** **Synthesis** Problem• Given• Initial gate-level netlist• Design constraints- Input arrival times, output required times, power consumption, noiseimmunity, etc…• Target technology libraries• Produce• Smaller, faster or cooler gate-level netlist that meets constraintsVery hard optimization problem!ECE 260B – CSE 241A Static Timing Analysis 5Andrew B. Kahng, **UCSD**

Combinational **Logic** **Synthesis**netlist2-level**Logic** optLibrary**Logic****Synthesis**techindependenttechdependentmultilevel**Logic** optnetlistLibraryECE 260B – CSE 241A Static Timing Analysis 6Slide courtesy of Devadas, et. alAndrew B. Kahng, **UCSD**

Outline• Introduction• Two-level **Logic** **Synthesis**• Multi-level **Logic** **Synthesis**• Sequential **Logic** **Synthesis**ECE 260B – CSE 241A Static Timing Analysis 7Andrew B. Kahng, **UCSD**

Two-level **Logic** **Synthesis** Problem• Given an arbitrary logic function in two-level form,produce a smaller representation.• For sum-of-products (SOP) implementation on PLAs,fewer product terms and fewer inputs to each productterm mean smaller area.F = A B + A B CI1I2F = A BO1O2ECE 260B – CSE 241A Static Timing Analysis 8Andrew B. Kahng, **UCSD**

Boolean Functionsf(x) : B nBB = {0, 1}, x = (x 1 , x 2 , …, x n )• x 1 , x 2 , … are variables• x 1 , x 1 , x 2 , x 2 , … are literals• each vertex of B n is mapped to 0 or 1• the onset of f is a set of input values for which f(x) = 1• the offset of f is a set of input values for which f(x) = 0ECE 260B – CSE 241A Static Timing Analysis 9Andrew B. Kahng, **UCSD**

LiteralsECE 260B – CSE 241A Static Timing Analysis 10Slide courtesy of Devadas, et. alAndrew B. Kahng, **UCSD**

Boolean FormulasECE 260B – CSE 241A Static Timing Analysis 11Slide courtesy of Devadas, et. alAndrew B. Kahng, **UCSD**

**Logic** Functions:ECE 260B – CSE 241A Static Timing Analysis 12Slide courtesy of Devadas, et. alAndrew B. Kahng, **UCSD**

Cube RepresentationECE 260B – CSE 241A Static Timing Analysis 13Slide courtesy of Devadas, et. alAndrew B. Kahng, **UCSD**

Operations on **Logic** Functions• (1) Complement: f finterchange ON and OFF-SETS• (2) Product (or intersection or logicalAND)h = f • g or h = f ∩ g• (3) Sum (or union or logical OR):h = f + g or h = f ∪ gECE 260B – CSE 241A Static Timing Analysis 14Andrew B. Kahng, **UCSD**

Sum-of-products (SOP)• A function can be represented by a sum of cubes(products):f = ab + ac + bcSince each cube is a product of literals, this is a “sum ofproducts” representation• A SOP can be thought of as a set of cubes FF = {ab, ac, bc} = C• A set of cubes that represents f is called a cover of f.F={ab, ac, bc} is a cover of f = ab + ac + bc.ECE 260B – CSE 241A Static Timing Analysis 15Andrew B. Kahng, **UCSD**

Prime Cover• A cube is prime if there is no other cube that contains it(for example, b c is not a prime but b is)• A cover is prime iff all of its cubes are primecabECE 260B – CSE 241A Static Timing Analysis 16Andrew B. Kahng, **UCSD**

Irredundant Cube• A cube of a cover C is irredundant if C fails to be a coverif c is dropped from C• A cover is irredundant iff all its cubes are irredudant (forexmaple, F = a b + a c + b c)caECE 260B – CSE 241A Static Timing Analysis 17bNot coveredAndrew B. Kahng, **UCSD**

Quine-McCluskey Method• We want to find a minimum prime and irredundant coverfor a given function.• Prime cover leads to min number of inputs to each product term.• Min irredundant cover leads to min number of product terms.• Quine-McCluskey (QM) method (1960’s) finds aminimum prime and irredundant cover.• Step 1: List all minterms of on-set: O(2^n) n = #inputs• Step 2: Find all primes: O(3^n) n = #inputs• Step 3: Construct minterms vs primes table• Step 4: Find a min set of primes that covers all the minterms:O(2^m) m = #primesECE 260B – CSE 241A Static Timing Analysis 18Andrew B. Kahng, **UCSD**

QM Example (Step 1)• F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c• List all on-set mintermsMintermsa’ b’ c’a b’ c’a b’ ca b ca’ b cECE 260B – CSE 241A Static Timing Analysis 19Andrew B. Kahng, **UCSD**

QM Example (Step 2)• F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c• Find all primes.primesb’ c’a b’a cb cECE 260B – CSE 241A Static Timing Analysis 20Andrew B. Kahng, **UCSD**

QM Example (Step 3)• F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c• Construct minterms vs primes table (prime implicanttable) by determining which cube is contained in whichprime. X at row i, colum j means that cube in row i iscontained by prime in column j.b’ c’a b’a cb ca’ b’ c’Xa b’ c’XXa b’ cXXa b cXXa’ b cXECE 260B – CSE 241A Static Timing Analysis 21Andrew B. Kahng, **UCSD**

QM Example (Step 4)• F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c• Find a minimum set of primes that covers all the minterms“Minimum column covering problem”b’ c’a b’a cb ca’ b’ c’Xa b’ c’XXa b’ cXXa b cXXa’ b cXEssential primesECE 260B – CSE 241A Static Timing Analysis 22Andrew B. Kahng, **UCSD**

ESPRESSO – Heuristic Minimizer• Quine-McCluskey gives a minimum solution but is onlygood for functions with small number of inputs (< 10)• ESPRESSO is a heuristic two-level minimizer that finds a“minimal” solutionESPRESSO(F) {do {reduce(F);expand(F);irredundant(F);} while (fewer terms in F);verfiy(F);}ECE 260B – CSE 241A Static Timing Analysis 23Andrew B. Kahng, **UCSD**

ESPRESSO ILLUSTRATEDReduceExpandIrredundantECE 260B – CSE 241A Static Timing Analysis 24Andrew B. Kahng, **UCSD**

Outline• Introduction• Two-level **Logic** **Synthesis**• Multi-level **Logic** **Synthesis**ECE 260B – CSE 241A Static Timing Analysis 25Andrew B. Kahng, **UCSD**

Multi-level **Logic** **Synthesis**• Two-level logic synthesis is effective and mature• Two-level logic synthesis is directly applicable to PLAsand PLDsBut…• There are many functions that are too expensive toimplement in two-level forms (too many product terms!)• Two-level implementation constrains layout (AND-plane,OR-plane)• Rule of thumb:• Two-level logic is good for control logic• Multi-level logic is good for datapath or random logicECE 260B – CSE 241A Static Timing Analysis 26Andrew B. Kahng, **UCSD**

Representation: Boolean NetworkBoolean network:• directed acyclic graph (DAG)• node logic function representationf j(x,y)• node variable y j: y j= f j(x,y)• edge (i,j) if f jdepends explicitly ony iInputs x = (x 1 , x 2 ,…,x n )Outputs z = (z 1 , z 2 ,…,z p )ECE 260B – CSE 241A Static Timing Analysis 27Slide courtesy of BraytonAndrew B. Kahng, **UCSD**

Multi-level **Logic** **Synthesis** Problem• Given• Initial Boolean network• Design constraints- Arrival times, required times, power consumption, noise immunity,etc…• Target technology libraries• Produce• a minimum area netlist consisting of the gates from the targetlibraries such that design constraints are satisfiedECE 260B – CSE 241A Static Timing Analysis 28Andrew B. Kahng, **UCSD**

Modern Approach to **Logic** Optimization• Divide logic optimization into two subproblems:• Technology-independent optimization- determine overall logic structure- estimate costs (mostly) independent oftechnology- simplified cost modeling• Technology-dependent optimization(technology mapping)- binding onto the gates in the library- detailed technology-specific cost model• Orchestration of various optimization/transformationtechniques for each subproblemECE 260B – CSE 241A Static Timing Analysis 29Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Technology-Independent OptimizationSimplified cost models• Area = sum of factored form literals in all nodes• Number of product terms is not a good measure of area in multilevelimplementationf=ad+ae+bd+be+cd+ce (6 product terms)f’=a’b’c’+d’e’ (2 product terms)The only difference between f and f’ is inversion⇒ f=(a+b+c)(d+e) (5 literals in factored form)⇒ f’= a’b’c’ + d’e’ (5 literals in factored form)• Delay = levels of logic on critical pathsECE 260B – CSE 241A Static Timing Analysis 30Andrew B. Kahng, **UCSD**

Technology-Independent Optimization• Technology-independent optimization is a bag of tricks:• Two-level minimization (also called simplify)• Constant propagation (also called sweep)f = a b + c; b = 1 => f = a + c• Decomposition (single function)f = abc+abd+a’c’d’+b’c’d’ => f = xy + x’y’; x = ab ; y = c+d• Extraction (multiple functions)f = (az+bz’)cd+e g = (az+bz’)e’ h = cde⇓f = xy+e g = xe’ h = ye x = az+bz’ y = cdECE 260B – CSE 241A Static Timing Analysis 31Andrew B. Kahng, **UCSD**

More Technology-Independent Optimization• More technology-independent optimization tricks:• Substitutiong = a+b f = a+bc⇓f = g(a+b)• Collapsing (also called elimination)f = ga+g’b g = c+d⇓f = ac+ad+bc’d’ g = c+d• Factoring (series-parallel decomposition)f = ac+ad+bc+bd+e => f = (a+b)(c+d)+eECE 260B – CSE 241A Static Timing Analysis 32Andrew B. Kahng, **UCSD**

Summary of Typical Recipe for TI Optimization• Propagate constants• Simplify: two-level minimization at Boolean network node• Decomposition• Local “Boolean” optimizations• Boolean techniques exploit Boolean identities (e.g., a a’ = 0)Consider f = a b’ + a c’ + b a’ + b c’ + c a’ + c b’• Algebraic factorization proceduresf = a (b’ + c’) + a’ (b + c) + b c’ + c b’• Boolean factorization producesf = (a + b + c) (a’ + b’ + c’)ECE 260B – CSE 241A Static Timing Analysis 33Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Technology-Dependent OptimizationTechnology-dependent optimization consists of• Technology mapping: maps Boolean network to a set ofgates from technology libraries• Local transformations• Discrete resizing• Cloning• Fanout optimization (buffering)• **Logic** restructuringECE 260B – CSE 241A Static Timing Analysis 34Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Technology MappingInputOutput1. Technology independent, optimized logic network2. Description of the gates in the library with their cost• Netlist of gates (from library) which minimizes total costGeneral Approach• Construct a subject DAG for the network• Represent each gate in the target library by pattern DAG’s• Find an optimal-cost covering of subject DAG using thecollection of pattern DAG’s• Canonical form: 2-input NAND gates and invertersECE 260B – CSE 241A Static Timing Analysis 35Andrew B. Kahng, **UCSD**

DAG Covering• DAG covering is an NP-hard problem• Solve the sub-problem optimally• Partition DAG into a forest of trees• Solve each tree optimally using tree covering• Stitch trees back togetherECE 260B – CSE 241A Static Timing Analysis 36Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Tree Covering Algorithm• Transform netlist and libraries into canonical forms• 2-input NANDs and inverters• Visit each node in BFS from inputs to outputs• Find all candidate matches at each node N- Match is found by comparing topology only (no need to comparefunctions)• Find the optimal match at N by computing the new cost- New cost = cost of match at node N + sum of costs for matches atchildren of N• Store the optimal match at node N with cost• Optimal solution is guaranteed if cost is area• Complexity = O(n) where n is the number of nodes innetlistECE 260B – CSE 241A Static Timing Analysis 37Andrew B. Kahng, **UCSD**

Tree Covering ExampleFind an ``optimal’’ (in area, delay, power) mapping of this circuitinto the technology library (simple example below)ECE 260B – CSE 241A Static Timing Analysis 38Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Elements of a library - 1Element/Area CostTree Representation (normal form)INVERTER 2NAND2 3NAND3 4NAND4 5ECE 260B – CSE 241A Static Timing Analysis 39Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Elements of a library - 2Element/Area CostTree Representation (normal form)AOI21 4AOI22 5ECE 260B – CSE 241A Static Timing Analysis 40Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Trivial Coveringsubject DAG7 NAND2 (3) = 215 INV (2) = 10Area cost 31Can we do better with tree covering?ECE 260B – CSE 241A Static Timing Analysis 41Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Optimal tree covering - 13223``subject tree’’ECE 260B – CSE 241A Static Timing Analysis 42Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Optimal tree covering - 2382235``subject tree’’ECE 260B – CSE 241A Static Timing Analysis 43Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Optimal tree covering - 338132235``subject tree’’Cover with ND2 or ND3 ?1 NAND2 3+ subtree 5Area cost 8ECE 260B – CSE 241A Static Timing Analysis 441 NAND3 = 4Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Optimal tree covering – 3b38132235 4``subject tree’’Label the root of the sub-tree with optimal match and costSlide courtesy of KeutzerECE 260B – CSE 241A Static Timing Analysis 45Andrew B. Kahng, **UCSD**

Optimal tree covering - 43Cover with INV or AO21 ?81322254``subject tree’’1 Inverter 2+ subtree 13Area cost 15ECE 260B – CSE 241A Static Timing Analysis 461 AO21 4+ subtree 1 3+ subtree 2 2Area cost 9Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Optimal tree covering – 4b3813922254``subject tree’’Label the root of the sub-tree with optimal match and costECE 260B – CSE 241A Static Timing Analysis 47Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Optimal tree covering - 59Cover with ND2 or ND3 ?82NAND2subtree 1 9subtree 2 41 NAND2 3ECE 260B – CSE 241A Static Timing Analysis 484subtree 1 8subtree 2 2subtree 3 41 NAND3 4Area cost 16Area cost 18Slide courtesy of Keutzer``subject tree’’NAND3Andrew B. Kahng, **UCSD**

Optimal tree covering – 5b981624``subject tree’’Label the root of the sub-tree with optimal match and costECE 260B – CSE 241A Static Timing Analysis 49Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Optimal tree covering - 6Cover with INV or AOI21 ?13165``subject tree’’INVsubtree 1 161 INV 2AOI21subtree 1 13subtree 2 51 AOI21 4ECE 260B – CSE 241A Static Timing Analysis 50Area cost 18Slide courtesy of KeutzerArea cost 22Andrew B. Kahng, **UCSD**

Optimal tree covering – 6b1318165``subject tree’’Label the root of the sub-tree with optimal match and costECE 260B – CSE 241A Static Timing Analysis 51Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Optimal tree covering - 7Cover with ND2 or ND3 or ND4 ?``subject tree’’ECE 260B – CSE 241A Static Timing Analysis 52Slide courtesy of KeutzerAndrew B. Kahng, **UCSD**

Cover 1 - NAND2Cover with ND2 ?918164``subject tree’’subtree 1 18subtree 2 01 NAND2 3Slide courtesy of KeutzerECE 260B – CSE 241A Static Timing Analysis 53Area cost 21Andrew B. Kahng, **UCSD**

Cover 2 - NAND3Cover with ND3?94``subject tree’’ECE 260B – CSE 241A Static Timing Analysis 54subtree 1 9subtree 2 4subtree 3 01 NAND3 4Slide courtesy of KeutzerArea cost 17Andrew B. Kahng, **UCSD**

Cover - 3Cover with ND4 ?824``subject tree’’ECE 260B – CSE 241A Static Timing Analysis 55subtree 1 8subtree 2 2subtree 3 4subtree 4 01 NAND4 5Slide courtesy of KeutzerArea cost 19Andrew B. Kahng, **UCSD**

Optimal Cover was Cover 2ND2AOI21ND3INVND3``subject tree’’ECE 260B – CSE 241A Static Timing Analysis 56INV 2ND2 32 ND3 8AOI21 4Slide courtesy of KeutzerArea cost 17Andrew B. Kahng, **UCSD**

Summary of Technology Mapping• DAG covering formulation• Separated library issues from mapping algorithm (can’t do thiswith rule-based systems)• Tree covering approximation• Very efficient (linear time)• Applicable to wide range of libraries (std cells, gate arrays) andtechnologies (FPGAs, CPLDs)• Weaknesses• Problems with DAG patterns (Multiplexors, full adders, …)• Large input gates lead to a large number of patternsECE 260B – CSE 241A Static Timing Analysis 57Andrew B. Kahng, **UCSD**

Local Transformations• Given• Technology-mapped netlist• Target technology libraries• Some violations (timing, noise immunity, power, etc…)• Produce• New netlist that corrects the given violations without introducingnew violations• Approach: another bag of tricks• Discrete resizing• Cloning• Buffering• **Logic** restructuring• More…ECE 260B – CSE 241A Static Timing Analysis 58Andrew B. Kahng, **UCSD**

Discrete Resizingabab?A0.035def0.20.20.3d0.050.040.030.020.0100 0.2 0.4 0.6 0.8 1loadA B CabC0.026Note that some arrival and requiredtimes become invalidECE 260B – CSE 241A Static Timing Analysis 59DAC-2002, Physical Chip ImplementationAndrew B. Kahng, **UCSD**

Cloningd0.050.040.030.020.0100 0.2 0.4 0.6 0.8 1loadA B Cd0.2dab?efgh0.20.20.20.2abABefghNote that loads at a, b increaseECE 260B – CSE 241A Static Timing Analysis 60DAC-2002, Physical Chip ImplementationAndrew B. Kahng, **UCSD**

Bufferingd0.050.040.030.020.0100 0.2 0.4 0.6 0.8 1loadA B Cabd0.2e 0.2af? 0.2bg 0.2h 0.2B0.1Bde0.20.2fgh0.20.20.2ECE 260B – CSE 241A Static Timing Analysis 61DAC-2002, Physical Chip ImplementationAndrew B. Kahng, **UCSD**

**Logic** Restructuring 1• Nodes in critical section that fan out outside of criticalsection are duplicatedffaeCollapsednodeabeebhdhECE 260B – CSE 241A Static Timing Analysis 62cLate inputsignalsSlides courtesy of KeutzercdAndrew B. Kahng, **UCSD**

**Logic** Restructuring 2• Place timing-critical nodes closer to output• Make them pass through fewer gates• After collapse, a divisor is selected such that substituting k into f places criticalsignal c and d closer to outputCollapse critical sectionRe-extract factor kffCollapsednodeab c deakdivisorbecdclose tooutputECE 260B – CSE 241A Static Timing Analysis 63Slides courtesy of KeutzerAndrew B. Kahng, **UCSD**

Summary of Local Transformations• Variety of methods for delay optimization• No single technique dominates• The one with more tricks wins? No!• Need a good framework for evaluating and processingdifferent transforms• Accurate, fast timing engine with incremental analysis capability- don’t want to retime the whole design for each local transform• Simultaneous min and max delay analysis- How does fixing the setup violation affect the existing hold checks?ECE 260B – CSE 241A Static Timing Analysis 64Andrew B. Kahng, **UCSD**