12.07.2015 Views

CS242 Operator Precedence Parsing Douglas C. Schmidt ...

CS242 Operator Precedence Parsing Douglas C. Schmidt ...

CS242 Operator Precedence Parsing Douglas C. Schmidt ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The Role of the Parser Technically, parsing is the process of determiningif a string of tokens can be derivedfrom the start state of a grammar<strong>CS242</strong><strong>Operator</strong> <strong>Precedence</strong> <strong>Parsing</strong>{ However, languages we wish to recognize in practiceare typically not fully describable by conventionalgrammars{ Grammars are capable of describing most, but notall, of the syntax of programming languages<strong>Douglas</strong> C. <strong>Schmidt</strong>Washingon University, St. Louis Four Basic <strong>Parsing</strong> Approaches{ Universal parsing methods , inecient, but general{ Top-down , generally ecient, useful for handcodingparsers{ Bottom-up , ecient, automatically generated{ Ad-hoc , eclectic, combined approach12General Types of Parsers:Context-Free Grammars (CFGs) Universal parsing methods{ Cocke-Younger-Kasami and Earley's algorithm Context-free languages can be described bya \context free grammar" (CFG) Top-down{ Recursive-descent with backtracking{ LL , left-to-right scanning, leftmost derivation{ predictive parsing , non-backtracking LL(1) parsing Bottom-up{ LR , left-to-right scanning, reverse rightmost derivation{ LALR , look ahead LR Ad-hoc{ e.g., combine recursive descent with operator-precedenceparsing3 Four components in a CFG1. Terminals are tokens{ e.g., if, then, while, 10.83, foo bar2. Nonterminals are syntactic abstractions denotingsets of strings{ e.g., STATEMENT, EXPRESSION, STATEMENT LIST3. The start symbol is a distinguished nonterminalthat denotes the set of strings dened by the language{ e.g., inPascal the start symbol is program4. Productions are rewriting rules that specify howterminals and nonterminals can be combined toform strings, e.g.,:stmt ! if '(' expr ')' stmt j if '(' expr ')' stmt else stmt4


Boolean expressionsCFG Examplee.g., true and false or (true or false) Grammar oneBEXPR ! BEXPR and BEXPRBEXPR ! BEXPR or BEXPRBEXPR ! true j falseBEXPR ! '(' BEXPR ')' Grammar twoBEXPR ! OR EXPROR EXPR ! OR EXPR or AND EXPR j AND EXPRAND EXPR ! AND EXPR and TOKEN j TOKENTOKEN ! true j false j '(' OR EXPR ')' Note that both grammars accept the samelanguage5Grammar-Related Terms <strong>Precedence</strong>{ Rules for binding operators to operands{ Higher precedence operators bind to their operandsbefore lower precedence ones{ e.g., / and * have equal precedence, but are higherthan either + or ,, which have equal precedence Associativity{ Grouping of operands for binary operators of equalprecedence{ Either left-, right-, or non- associative{ e.g.,+, -, *, / are left-associative= (assignment in C) and ** (exponentiation in Ada)are right-associativeoperators new and delete (free store allocation in C++)are non-associative6 Derivations{ Applies rewriting rules to generate strings in a languagedescribed by aCFG{ =>means "derives" in one step. e.g., B=>BorB means B derives B or B{ => means derives in zero or more steps. e.g., Parse trees{ Provides a graphical representation of derivations{ A root (represents the start symbol){ Leaves , labeled by terminals{ Internal nodes , labeled by non-terminalsB => BB => B or BB => true or false{ + => means derives in one or more steps. e.g.,B + => true or BB + => true or false. note that + => denes L(G), the language generatedby G Expression trees{ Amore compact representation of a parse tree{ Typically used to depict arithmetic expressions{ Leaves ! operands{ Internal nodes ! operators78


Sentential form{ A string (containing terminals and/or nonterminals)that is derived from the start symbol, e.g.,B, B or B, B or false, true or falseWhy Use Context-FreeGrammars? Sentence{ Is a string of terminals derivable from the startsymbole.g.,true and false is a string in the boolean expr languagetrue and or false is a not{ The parser determines whether an input string isa sentence in the language being compiled Push-down automata (PDA){ A parser that recognizes a language specied bya CFG simulates a push-down automata, i.e., anite automata with an unbounded stack. CFGs provide a precise and relatively comprehensiblespecication of programming languagesyntax Techniques exist for automatically generatingecient parsers for many grammars CFGs enable syntax-directed translation They facilitate programming language modicationsand extensions910<strong>Operator</strong>-<strong>Precedence</strong> <strong>Parsing</strong>(OPP) Uses a restricted form of shift-reduce parsingto recognize operator grammars:{ Contain no productions{ Have no two adjacent non-terminals<strong>Operator</strong>-<strong>Precedence</strong> <strong>Parsing</strong>(OPP) (cont'd) Strengths of OPP{ Easy to implement by hand or by a simple tabledrivengenerator Table driven approach uses a matrix containingthree disjoint precedence relations:1. < 2.:= Weaknesses of OPP{ Need clever lexical analyzer to handle certain overloadedoperators e.g., unary + and , versus binary+ and ,{ Only handles a small class of languages (operatorgrammars)3. >1112


OPP Algorithm General pseudo-codewhile (next token is not EOF) fif (token is TRUE or FALSE)push (handle stack, mk node (token));else if (f[top (op stack)] g[token])push (handle stack,mk node (pop (op stack),pop (handle stack),pop (handle stack)));if (top (op stack) == DELIMIT TOK&& token == DELIMIT TOK)return pop (handle stack);else if (top (op stack) == LPAREN TOK&& token == RPAREN TOK) fpop (op stack);continue;/* Jump over push operation below. */gpush (op stack, mk node (token));gg13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!