You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Syntax</strong> <strong>Analysis</strong> (ASU Ch 2.4)<br />
• construction of parse tree<br />
• bottom-up (nodes => root)<br />
• top-down (root => nodes)<br />
– L (left) to R (right)<br />
scanning of input string<br />
– may involve trial and<br />
error and backtracking<br />
– non-backtracking =><br />
predictive parser<br />
• top-down parsing<br />
• e.g.<br />
::= | ^id |<br />
array of<br />
<br />
::= integer | char |<br />
num dotdot num<br />
• start with the start symbol S<br />
(NT) from G = (S, P, NT, T)<br />
1<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
Recursive Descent Predictive Parsing (RDPP)<br />
(ASU Ch 2.4)<br />
• Define the (disjoint) sets “first” for each NT on RHS of P<br />
– first() = {integer, char, num}<br />
– first(^id) = {^}<br />
– first(array …) = {array}<br />
• PP (predictive Parser) requires<br />
– procedure for every NT - takes action based on first(a) for a<br />
on the RHS of a production P (using lookahead)<br />
– RHS of P<br />
• NT => call to corresponding procedure<br />
• T => match expected token with actual token (no match =><br />
error) and get next token<br />
– left recursion may have to be removed from the grammar<br />
2<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
<strong>Syntax</strong> <strong>Analysis</strong> (ASU Ch 4)<br />
• Syntactic structure (well formed programs)<br />
– block => (statement)*<br />
– statement => (expression)*<br />
– expression => (token)*<br />
– token => (symbol)*<br />
• context free grammar: BNF notation => parser<br />
• role of the parser<br />
– reads the token stream<br />
– verifies that the string w can be generated by the grammar G<br />
– handles error detection and recovery<br />
w<br />
LA<br />
ST<br />
SA<br />
PT<br />
3<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
Grammar Subclasses<br />
• LL(k)<br />
– input read from left to right LL(k)<br />
– corresponds to leftmost derivation of parse tree LL(k)<br />
– k symbol look ahead LL(k)<br />
– most commonly used is LL(1)<br />
• LR(k)<br />
– input read from left to right LR(k)<br />
– corresponds to rightmost derivation of parse tree LR(k)<br />
– k symbol look ahead LR(k)<br />
– used in bottom-up parsing (e.g. YACC)<br />
4<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
Error Types (ASU Ch 4.1)<br />
• error types<br />
– lexical - e.g. misspelling of id / keyword / operator<br />
– syntactic - e.g. missing parenthesis<br />
– semantic - e.g. incompatible operator<br />
– “logical” - e.g. infinite recursive calls<br />
• error handling<br />
– reporting - e.g. position in the source code (w)<br />
– recovery - e.g. repair an error and continue OR stop<br />
• remove a token from w (token assumed to be “extra”)<br />
• insert a token into w (token assumed to be “missing”)<br />
– studies show that errors are infrequent (missing { / } common)<br />
5<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
Error Recovery Strategies (ASU Ch 4.1)<br />
• panic mode<br />
– discard symbols until synchronising token found e.g. ‘;’, ‘}’<br />
• phrase level<br />
– local correction (in the phrase) e.g. insert missing symbol ‘,’,<br />
‘;’<br />
• error productions<br />
– add productions to grammar G (augmented grammar)<br />
• global corrections<br />
– erroneous string x is corrected to y<br />
– minimal sequence of change algorithms (least cost)<br />
– generally too expensive to implement (theoretical interest only)<br />
6<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
Context Free Grammars (CFGs) (ASU Ch 4.2)<br />
• reflect the inherently recursive structure of the PL<br />
• CFG definition<br />
– T: terminal symbols (synonym for token in CFG)<br />
– NT: non-terminal symbols (syntactic variable denoting<br />
sets of strings)<br />
– S: start symbol (in NT) (usually LHS of first P)<br />
– P: productions (how NTs and Ts combine)<br />
• example<br />
expr => expr op expr | (expr) | - expr | id<br />
op<br />
=> + | - | * | / | ^<br />
T = { id, +, -, *, /, ^} NT = {expr, op} S = {expr}<br />
productions<br />
7<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
Notational Conventions (ASU pp 166-167)<br />
• T<br />
– lower case letters e.g. a, b, c, …<br />
– operators e.g. +, -, ...<br />
– punctuation e.g. ; ,<br />
– boldface strings e.g. Id<br />
• NT<br />
– upper case letters e.g. A, B, C, …<br />
– S the start symbol in G = (S, P, NT, T)<br />
– lower case italic e.g. expr<br />
• grammar symbols X, Y, Z e.g. late alpha upper case<br />
• strings of Ts u, v … z e.g. late alpha lower case<br />
• strings of grammar symbols α, β, γ e.g. lower case Greek<br />
8<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
Derivations (ASU Ch 4.2)<br />
e.g. E => E A E | (E) | -E | id A => + | - | * | / | ^ (from above)<br />
• aAb => aγb if there exists a P A => γ<br />
• a =*=> b a derives b in zero or more steps<br />
• a =*=> b and b => c, then a =*=> c<br />
• L(G) is the language generated by grammar G<br />
– strings in L(G) may contain only Ts from G<br />
– string of Ts, w, are in L(G) if S =+=> w (one or more steps)<br />
– if S =*=> a where a may contain NTs<br />
• a is called a sentential form of G<br />
• a sentence is a sentential form with no NTs<br />
• e.g. - ( id + id ) is a sentence of the above grammar (verify this!)<br />
9<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
Leftmost Derivations (ASU Ch 4.2)<br />
• leftmost replacement (LL grammars)<br />
– E =lm=> - E =lm=> -(E) =lm=> -(EAE) =lm=> -(idAE) =lm=><br />
– -(id+E) =lm=> -(id+id)<br />
– =lm=> means replace the leftmost NT<br />
– if wAc =lm=> wβc and P: A => β then w consists of Ts<br />
– a =lm=> b a derives b by leftmost derivation<br />
– S =lm=> a a is a left-sentential form of G<br />
• rightmost derivation =rm=><br />
– mutatis mutandum<br />
– sometimes called canonical forms<br />
10<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
Parse Trees & Derivations (ASU Ch 4.2)<br />
• PT is a graphical representation of a derivation<br />
• every PT has associated with it<br />
– a unique leftmost derivation (LMD)<br />
– a unique rightmost derivation (RMD)<br />
• a sentence may have more than one associated PT, LMD,<br />
RMD<br />
• a grammar G which has more than one PT for a sentence is<br />
said to be ambiguous<br />
• non-ambiguous grammars are desirable<br />
• exercise: read ASU Ch 4.3 - Writing a Grammar<br />
11<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>
<strong>Syntax</strong> <strong>Analysis</strong>: Summary<br />
• Parse Tree: construction: top-down / bottom-up<br />
• Recursive Descent Predictive Parsing (RDPP)<br />
• Grammar subclasses: LL(k) & LR(k)<br />
• Errors: types, handling, recovery strategies<br />
• Context Free Grammars (CFG) G = (S, P, NT, T)<br />
• Notational Conventions (check the publication)<br />
• Derivations: LMD, RMD - sentential form, sentence<br />
• Parse Tree: graphical representation of a derivation<br />
• Non-ambiguous grammars are desirable<br />
12<br />
14/10/2014 DFR - CC - <strong>Syntax</strong> <strong>Analysis</strong>