Syntax Analysis

Syntax Analysis (ASU Ch 2.4) 

• construction of parse tree 

• bottom-up (nodes => root) 

• top-down (root => nodes) 

– L (left) to R (right) 

scanning of input string 

– may involve trial and 

error and backtracking 

– non-backtracking => 

predictive parser 

• top-down parsing 

• e.g. 

::= | ^id | 

array of 

 

::= integer | char | 

num dotdot num 

• start with the start symbol S 

(NT) from G = (S, P, NT, T) 

1 

14/10/2014 DFR - CC - Syntax Analysis

Recursive Descent Predictive Parsing (RDPP) 

(ASU Ch 2.4) 

• Define the (disjoint) sets “first” for each NT on RHS of P 

– first() = {integer, char, num} 

– first(^id) = {^} 

– first(array …) = {array} 

• PP (predictive Parser) requires 

– procedure for every NT - takes action based on first(a) for a 

on the RHS of a production P (using lookahead) 

– RHS of P 

• NT => call to corresponding procedure 

• T => match expected token with actual token (no match => 

error) and get next token 

– left recursion may have to be removed from the grammar 

2 


Syntax Analysis (ASU Ch 4) 

• Syntactic structure (well formed programs) 

– block => (statement)* 

– statement => (expression)* 

– expression => (token)* 

– token => (symbol)* 

• context free grammar: BNF notation => parser 

• role of the parser 

– reads the token stream 

– verifies that the string w can be generated by the grammar G 

– handles error detection and recovery 

w 

LA 

ST 

SA 

PT 

3 


Grammar Subclasses 

• LL(k) 

– input read from left to right LL(k) 

– corresponds to leftmost derivation of parse tree LL(k) 

– k symbol look ahead LL(k) 

– most commonly used is LL(1) 

• LR(k) 

– input read from left to right LR(k) 

– corresponds to rightmost derivation of parse tree LR(k) 

– k symbol look ahead LR(k) 

– used in bottom-up parsing (e.g. YACC) 

4 


Error Types (ASU Ch 4.1) 

• error types 

– lexical - e.g. misspelling of id / keyword / operator 

– syntactic - e.g. missing parenthesis 

– semantic - e.g. incompatible operator 

– “logical” - e.g. infinite recursive calls 

• error handling 

– reporting - e.g. position in the source code (w) 

– recovery - e.g. repair an error and continue OR stop 

• remove a token from w (token assumed to be “extra”) 

• insert a token into w (token assumed to be “missing”) 

– studies show that errors are infrequent (missing { / } common) 

5 


Error Recovery Strategies (ASU Ch 4.1) 

• panic mode 

– discard symbols until synchronising token found e.g. ‘;’, ‘}’ 

• phrase level 

– local correction (in the phrase) e.g. insert missing symbol ‘,’, 

‘;’ 

• error productions 

– add productions to grammar G (augmented grammar) 

• global corrections 

– erroneous string x is corrected to y 

– minimal sequence of change algorithms (least cost) 

– generally too expensive to implement (theoretical interest only) 

6 


Context Free Grammars (CFGs) (ASU Ch 4.2) 

• reflect the inherently recursive structure of the PL 

• CFG definition 

– T: terminal symbols (synonym for token in CFG) 

– NT: non-terminal symbols (syntactic variable denoting 

sets of strings) 

– S: start symbol (in NT) (usually LHS of first P) 

– P: productions (how NTs and Ts combine) 

• example 

expr => expr op expr | (expr) | - expr | id 

op 

=> + | - | * | / | ^ 

T = { id, +, -, *, /, ^} NT = {expr, op} S = {expr} 

productions 

7 


Notational Conventions (ASU pp 166-167) 

• T 

– lower case letters e.g. a, b, c, … 

– operators e.g. +, -, ... 

– punctuation e.g. ; , 

– boldface strings e.g. Id 

• NT 

– upper case letters e.g. A, B, C, … 

– S the start symbol in G = (S, P, NT, T) 

– lower case italic e.g. expr 

• grammar symbols X, Y, Z e.g. late alpha upper case 

• strings of Ts u, v … z e.g. late alpha lower case 

• strings of grammar symbols α, β, γ e.g. lower case Greek 

8 


Derivations (ASU Ch 4.2) 

e.g. E => E A E | (E) | -E | id A => + | - | * | / | ^ (from above) 

• aAb => aγb if there exists a P A => γ 

• a =*=> b a derives b in zero or more steps 

• a =*=> b and b => c, then a =*=> c 

• L(G) is the language generated by grammar G 

– strings in L(G) may contain only Ts from G 

– string of Ts, w, are in L(G) if S =+=> w (one or more steps) 

– if S =*=> a where a may contain NTs 

• a is called a sentential form of G 

• a sentence is a sentential form with no NTs 

• e.g. - ( id + id ) is a sentence of the above grammar (verify this!) 

9 


Leftmost Derivations (ASU Ch 4.2) 

• leftmost replacement (LL grammars) 

– E =lm=> - E =lm=> -(E) =lm=> -(EAE) =lm=> -(idAE) =lm=> 

– -(id+E) =lm=> -(id+id) 

– =lm=> means replace the leftmost NT 

– if wAc =lm=> wβc and P: A => β then w consists of Ts 

– a =lm=> b a derives b by leftmost derivation 

– S =lm=> a a is a left-sentential form of G 

• rightmost derivation =rm=> 

– mutatis mutandum 

– sometimes called canonical forms 

10 


Parse Trees & Derivations (ASU Ch 4.2) 

• PT is a graphical representation of a derivation 

• every PT has associated with it 

– a unique leftmost derivation (LMD) 

– a unique rightmost derivation (RMD) 

• a sentence may have more than one associated PT, LMD, 

RMD 

• a grammar G which has more than one PT for a sentence is 

said to be ambiguous 

• non-ambiguous grammars are desirable 

• exercise: read ASU Ch 4.3 - Writing a Grammar 

11 


Syntax Analysis: Summary 

• Parse Tree: construction: top-down / bottom-up 

• Recursive Descent Predictive Parsing (RDPP) 

• Grammar subclasses: LL(k) & LR(k) 

• Errors: types, handling, recovery strategies 

• Context Free Grammars (CFG) G = (S, P, NT, T) 

• Notational Conventions (check the publication) 

• Derivations: LMD, RMD - sentential form, sentence 

• Parse Tree: graphical representation of a derivation 

• Non-ambiguous grammars are desirable 

12

Syntax Analysis

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?