29.04.2015 Views

Syntax Analysis

Syntax Analysis

Syntax Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Syntax</strong> <strong>Analysis</strong> (ASU Ch 2.4)<br />

• construction of parse tree<br />

• bottom-up (nodes => root)<br />

• top-down (root => nodes)<br />

– L (left) to R (right)<br />

scanning of input string<br />

– may involve trial and<br />

error and backtracking<br />

– non-backtracking =><br />

predictive parser<br />

• top-down parsing<br />

• e.g.<br />

::= | ^id |<br />

array of<br />

<br />

::= integer | char |<br />

num dotdot num<br />

• start with the start symbol S<br />

(NT) from G = (S, P, NT, T)<br />

1<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


Recursive Descent Predictive Parsing (RDPP)<br />

(ASU Ch 2.4)<br />

• Define the (disjoint) sets “first” for each NT on RHS of P<br />

– first() = {integer, char, num}<br />

– first(^id) = {^}<br />

– first(array …) = {array}<br />

• PP (predictive Parser) requires<br />

– procedure for every NT - takes action based on first(a) for a<br />

on the RHS of a production P (using lookahead)<br />

– RHS of P<br />

• NT => call to corresponding procedure<br />

• T => match expected token with actual token (no match =><br />

error) and get next token<br />

– left recursion may have to be removed from the grammar<br />

2<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


<strong>Syntax</strong> <strong>Analysis</strong> (ASU Ch 4)<br />

• Syntactic structure (well formed programs)<br />

– block => (statement)*<br />

– statement => (expression)*<br />

– expression => (token)*<br />

– token => (symbol)*<br />

• context free grammar: BNF notation => parser<br />

• role of the parser<br />

– reads the token stream<br />

– verifies that the string w can be generated by the grammar G<br />

– handles error detection and recovery<br />

w<br />

LA<br />

ST<br />

SA<br />

PT<br />

3<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


Grammar Subclasses<br />

• LL(k)<br />

– input read from left to right LL(k)<br />

– corresponds to leftmost derivation of parse tree LL(k)<br />

– k symbol look ahead LL(k)<br />

– most commonly used is LL(1)<br />

• LR(k)<br />

– input read from left to right LR(k)<br />

– corresponds to rightmost derivation of parse tree LR(k)<br />

– k symbol look ahead LR(k)<br />

– used in bottom-up parsing (e.g. YACC)<br />

4<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


Error Types (ASU Ch 4.1)<br />

• error types<br />

– lexical - e.g. misspelling of id / keyword / operator<br />

– syntactic - e.g. missing parenthesis<br />

– semantic - e.g. incompatible operator<br />

– “logical” - e.g. infinite recursive calls<br />

• error handling<br />

– reporting - e.g. position in the source code (w)<br />

– recovery - e.g. repair an error and continue OR stop<br />

• remove a token from w (token assumed to be “extra”)<br />

• insert a token into w (token assumed to be “missing”)<br />

– studies show that errors are infrequent (missing { / } common)<br />

5<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


Error Recovery Strategies (ASU Ch 4.1)<br />

• panic mode<br />

– discard symbols until synchronising token found e.g. ‘;’, ‘}’<br />

• phrase level<br />

– local correction (in the phrase) e.g. insert missing symbol ‘,’,<br />

‘;’<br />

• error productions<br />

– add productions to grammar G (augmented grammar)<br />

• global corrections<br />

– erroneous string x is corrected to y<br />

– minimal sequence of change algorithms (least cost)<br />

– generally too expensive to implement (theoretical interest only)<br />

6<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


Context Free Grammars (CFGs) (ASU Ch 4.2)<br />

• reflect the inherently recursive structure of the PL<br />

• CFG definition<br />

– T: terminal symbols (synonym for token in CFG)<br />

– NT: non-terminal symbols (syntactic variable denoting<br />

sets of strings)<br />

– S: start symbol (in NT) (usually LHS of first P)<br />

– P: productions (how NTs and Ts combine)<br />

• example<br />

expr => expr op expr | (expr) | - expr | id<br />

op<br />

=> + | - | * | / | ^<br />

T = { id, +, -, *, /, ^} NT = {expr, op} S = {expr}<br />

productions<br />

7<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


Notational Conventions (ASU pp 166-167)<br />

• T<br />

– lower case letters e.g. a, b, c, …<br />

– operators e.g. +, -, ...<br />

– punctuation e.g. ; ,<br />

– boldface strings e.g. Id<br />

• NT<br />

– upper case letters e.g. A, B, C, …<br />

– S the start symbol in G = (S, P, NT, T)<br />

– lower case italic e.g. expr<br />

• grammar symbols X, Y, Z e.g. late alpha upper case<br />

• strings of Ts u, v … z e.g. late alpha lower case<br />

• strings of grammar symbols α, β, γ e.g. lower case Greek<br />

8<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


Derivations (ASU Ch 4.2)<br />

e.g. E => E A E | (E) | -E | id A => + | - | * | / | ^ (from above)<br />

• aAb => aγb if there exists a P A => γ<br />

• a =*=> b a derives b in zero or more steps<br />

• a =*=> b and b => c, then a =*=> c<br />

• L(G) is the language generated by grammar G<br />

– strings in L(G) may contain only Ts from G<br />

– string of Ts, w, are in L(G) if S =+=> w (one or more steps)<br />

– if S =*=> a where a may contain NTs<br />

• a is called a sentential form of G<br />

• a sentence is a sentential form with no NTs<br />

• e.g. - ( id + id ) is a sentence of the above grammar (verify this!)<br />

9<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


Leftmost Derivations (ASU Ch 4.2)<br />

• leftmost replacement (LL grammars)<br />

– E =lm=> - E =lm=> -(E) =lm=> -(EAE) =lm=> -(idAE) =lm=><br />

– -(id+E) =lm=> -(id+id)<br />

– =lm=> means replace the leftmost NT<br />

– if wAc =lm=> wβc and P: A => β then w consists of Ts<br />

– a =lm=> b a derives b by leftmost derivation<br />

– S =lm=> a a is a left-sentential form of G<br />

• rightmost derivation =rm=><br />

– mutatis mutandum<br />

– sometimes called canonical forms<br />

10<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


Parse Trees & Derivations (ASU Ch 4.2)<br />

• PT is a graphical representation of a derivation<br />

• every PT has associated with it<br />

– a unique leftmost derivation (LMD)<br />

– a unique rightmost derivation (RMD)<br />

• a sentence may have more than one associated PT, LMD,<br />

RMD<br />

• a grammar G which has more than one PT for a sentence is<br />

said to be ambiguous<br />

• non-ambiguous grammars are desirable<br />

• exercise: read ASU Ch 4.3 - Writing a Grammar<br />

11<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>


<strong>Syntax</strong> <strong>Analysis</strong>: Summary<br />

• Parse Tree: construction: top-down / bottom-up<br />

• Recursive Descent Predictive Parsing (RDPP)<br />

• Grammar subclasses: LL(k) & LR(k)<br />

• Errors: types, handling, recovery strategies<br />

• Context Free Grammars (CFG) G = (S, P, NT, T)<br />

• Notational Conventions (check the publication)<br />

• Derivations: LMD, RMD - sentential form, sentence<br />

• Parse Tree: graphical representation of a derivation<br />

• Non-ambiguous grammars are desirable<br />

12<br />

14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!