10.07.2015 Views

CSE 131A Lecture 10 Top-Down Parsing

CSE 131A Lecture 10 Top-Down Parsing

CSE 131A Lecture 10 Top-Down Parsing

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>CSE</strong> <strong>131A</strong><strong>Lecture</strong> <strong>10</strong><strong>Top</strong>-<strong>Down</strong> <strong>Parsing</strong>Department of Computer Science & EngineeringUniversity of California, San DiegoWinter 2006


Today’s <strong>Lecture</strong>• Time costs of different types of parsers• Introduction to top-down parsing2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 2


Different types of parsers• Bottom-up parsers: build parse tree starting fromleaves and working up to the root– Shift-reduce parsing, LR parsing• <strong>Top</strong>-down parsers: build parse tree starting fromroot, and working down to leaves– Predictive recursive descent parsing, LL parsing• Universal parsers: can parse any CFL– Cocke-Younger-Kasami, Earley algorithms2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 3


Parser time costs• A parser can be constructed for any context-freegrammar, since membership in a CFL is decidable• CFL membership is in the complexity class P– For any context free grammar, there is a parser that runsin O(n 3 ) time in the worst case, where n is the length ofthe token sequence given as input• Universal context-free parsers such as Earley,Cocke-Younger-Kasimi are O(n 3 ) parsers• But O(n 3 )-time parsers are too slow for use inpractical programming language compilers…2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 4


Parser time costs• Fortunately, it is not too difficult to designprogramming languages with syntax that can beparsed more efficiently– LR(1) parsers• time O(n) in the worst case• single pass through input, only 1-token lookahead– LL(1) top down recursive-descent parsers• also run in O(n)-time• today, we’ll discuss this type of parser• These type of parsers work for only certain types ofcontext-free grammars– The limitations are reasonable, however2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 5


A example grammar• Consider the following grammar for a subsetof Pascal type declarations:type ::= simple| id| array [ simple ] of typesimple ::= integer| char| num “..” numExample: array [num .. num] of integerwhere 1 and <strong>10</strong> are num tokens2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 7


Basic <strong>Top</strong>-down <strong>Parsing</strong> “Algorithm”1. Construct the root of the parse tree, labeled with the startsymbol. Initialize the current node to be the root, andinitialize current token to be the first token in the input.2. If the current node is labeled with a nonterminal, select aproduction consistent with the current token, andconstruct child nodes of the current node for the symbolson the RHS of the production.3. Else if the current node is labeled with a terminal equalto the current token, advance the input, and set thecurrent token to be the next token in the input.4. If all input has been consumed and all leaves of the parsetree are matched terminals, done! Else select leftmostuntouched leaf node of the parse tree as the current nodeand go to 2.(NOTE: In general, the choice of rule in 2 might not“work,” and backtracking may be necessary.)2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 8


Building a parse tree top-downarray [num .. num] of integertypeRoot, labeled with start symbol, is current node;input ready at first token2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 9


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typePick a grammar production (rule)for current node that is consistentwith lookahead and expandtype array [ simple ] of type2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> <strong>10</strong>


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typeSelect leftmost untouched leaf node...array is terminal, so advance input2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 11


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typeSelect leftmost untouched leaf node...[ is terminal, so advance input2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 12


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typeSelect leftmost untouched leaf node...simple is nonterminal, so pick a rulethat is consistent with lookaheadand expandnum dotdot numsimple num dotdot num2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 13


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typenum dotdot numSelect leftmost untouched leaf node...num is terminal, so advance input2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 14


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typenum dotdot numSelect leftmost untouched leaf node...dotdot is terminal, so advance input2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 15


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typenum dotdot numSelect leftmost untouched leaf node...num is terminal, so advance input2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 16


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typenum dotdot numSelect leftmost untouched leaf node...] is terminal, so advance input2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 17


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typenum dotdot numSelect leftmost untouched leaf node...of is terminal, so advance input2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 18


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typeSelect leftmost untouched leaf node...type is nonterminal, so pick a rulethat is consistent with lookaheadand expandtype simplenum dotdot numsimple2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 19


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typeSelect leftmost untouched leaf node...simple is nonterminal, so pick a rulethat is consistent with lookaheadand expandsimple integernum dotdot numsimpleinteger2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 20


Building a parse tree top-downarray [num .. num] of integertypearray [ simple ] of typeSelect leftmost untouched leaf node...integer is terminal, so advance inputnum dotdot numsimpleintegerNow all input has been consumed and all parse tree leavesare matched terminals... so parse is successful2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 21


• Recall our grammar:Leftmost Derivationtype ::= simple | id | array [ simple ] of typesimple ::= integer | char | num “..” num• That top-down parse corresponded to aleftmost derivationtype => array [ simple ] of type=> array [ num .. num ] of type=> array [ num .. num ] of simple=> array [ num .. num ] of integer2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 22


About that algorithm• We were lucky; with the particular combination ofgrammar and input we could easily find a production that“worked”, that let us continue on to a successful parse ofthe input• With other CFGs or inputs, if a parse didn't succeed, wemight have had to “backtrack” to consider otherproductions that might work• Then we would need to keep track of productions we havetried so we can tell when we have tried them all (if none ofthem work, input contains a syntax error)• Better: be sure we always know the right production toapply, just looking ahead at the next token– Only possible with some kinds of context-free grammars2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 23


Lookahead grammars• Suppose a grammar has a production A • Let the “lookahead set” for , called First(), be theset of all tokens t such that t appears as the first symbolin some derivation from • Of course, some nonterminals may have multiple righthand-sides:A | | • If, for every nonterminal in the grammar, the lookaheadsets for its right-hand-sides are disjoint, then it is a 1-symbol lookahead grammar• With a 1-symbol lookahead grammar, we can alwayspick which production to apply, given by looking at thenext input token. We can have a predictive parser, withno backtracking needed!2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 24


Lookahead grammars• If for every nonterminal in the grammar, thelookahead sets for its right-hand-sides are disjoint,then it is a 1-symbol lookahead grammar• Thus, for each non-terminal…– We look at all the right hand sides 1 , 2 , n– Compute First( i ) i– These sets should share nothing in common• With a 1-symbol lookahead, we will always knowwhat production to apply, given the current inputtoken. No backtracking is needed!2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 25


Finding lookahead sets• Consider the type declaration grammar:type simple| id| array [ simple ] of typesimple integer| char| num “..” num• Compute these. Is the grammar 1-sym lookahead?First(simple) =First( id) =First(array [ simple ] of type) =First(integer) =First(char) =First(num ".." num) =2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 26


Predictive <strong>Parsing</strong>• We’d like to avoid costly backtracking• With a lookahead grammar, the lookahead sets fora nonterminal let us avoid the guesswork• This permits predictive parsing• Let’s build a simple predictive parser to handle theexample Pascal type declaration language• We will take a procedural approach: we willdefine a procedure for each nonterminal– Contrast with table-driven approaches2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 27


Design of the predictive parser• Write a procedure for each nonterminal in thegrammar. For our example, they are: type( ) simple( )• Writing these requires knowing the nonterminals’lookahead sets• Keep a global variable lookahead indexing thecurrent input token• Basic idea: if the current lookahead is in thelookahead set of one of this nonterminal’s righthand-sides,then call the procedures for symbolson that RHS in order2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 28


Predictive parser procedure match()• Advances lookahead to the next input token if theargument matches the current lookahead tokenprocedure match( t:token )if (lookahead == t) thenlookahead := next_token( )else error( )end match2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 30


Predictive parser procedure type()procedure typeif lookahead { integer, char, num} thensimple( )type simpleelse if lookahead = then| idmatch( ); match( id );| array [ simple ] of typeelse if lookahead = array thenmatch( array ); match( [ ); simple( );match( ] ); match( of ); type( );end ifelse error()end type2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 31


Predictive parser procedure simple()procedure simpleif lookahead = integer thenmatch(integer)else if lookahead = char thenmatch(char);else if lookahead = num thensimple integer| char| num “..” nummatch(num); match(ddot); match(num);end ifelse error()end type2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 32


A <strong>Top</strong>-<strong>Down</strong> Predictive Parse• Input tokens array [num .. num] of integer• Lookahead symbol initially points at firsttoken: array• type( ) checks lookahead, and calls:match(array); match( [ ); simple( );match( ] ); match( of ); type( );• This corresponds to the productiontype array [ simple ] of type2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 33


A <strong>Top</strong>-<strong>Down</strong> Predictive Parse• Consume tokens with match(array); match( [ );• Lookahead points to num• simple( ) (Afterward, lookahead points to ] )match(num); match(ddot); match(num);• Consume tokens with match(]); match( of );• type( ) is called recursively, calls simple( );• simple( ) executesmatch( integer );• The call tree exactly corresponds to the parse tree– Run time stack is being used as the parser’s stack2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 34


<strong>Top</strong> down predictive parsing and left recursion• As with all recursion, there better be a “base case”that terminates the recursion and permits forwardprogress• For our predictive parser, this occurs when theleftmost symbol in the RHS of a production is aterminal that matches the current token• The token is consumed and we progress towardthe end of the input• But this will not happen if a “left-recursive”production of the form A A is ever applied2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 35


Left recursion and lookahead• Can a lookahead grammar ever have a left-recursiveproduction?• Suppose the grammar has a left-recursive productionA A• For A to derive sentences at all, the grammar must alsohave another production on A, say A • But since A A , the nonterminal A has twoproductions with overlapping lookahead sets• So no 1-symbol lookahead grammar will be left-recursive• But left recursion can be eliminated2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 36


Left recursion• Consider the following production, where the leftmost symbol on the RHS is the nonterminal beingdefined expr expr + term• If we execute this production, we loop forever, asno input is consumed• So we can’t have left-recursive rules in thegrammar for a top-down predictive parser!• There is always a way to eliminate left recursionwithout changing the language defined by thegrammar, but…• But we change the syntactic structure2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 37


Converting to right recursion• The left recursive productionA A | • Can be rewritten asA RR R | • R is a new nonterminal; is the empty symbol; and wenow have a right recursive production R R• We assume that neither nor begins with A, and ofcourse neither nor begin with R• The new right-recursive grammar can generate exactly thesame set of strings as the old one, but note that the parsetrees will be different…2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 38


Eliminating left recursion• Applying the notation to the exampleproduction:expr expr + term | term•We setA = expr, = +term, = term• We obtain a different parse tree2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 39


An example• Consider the following grammarS Aa | bA Ac | Sd | • In theory the -production is problematic,but not so in this case• There is no immediate left recursion for S-productions, nothing happens for i=1• For i=2, substitute the S- productions inA Sd to obtain A Ac | Aad | bd | 2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 40


An example - continued• Eliminating immediate left recursion amongthe A-productions results inS Aa | bA bdA’ | A’A’ cA’ | adA’ | 2/8/06 <strong>CSE</strong> <strong>131A</strong> - Winter 2006 - Lec <strong>10</strong> 41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!