29.04.2015 Views

Top-down Parsing (ASU Ch 4.4) Predictive Parsers (ASU 4.4 ...

Top-down Parsing (ASU Ch 4.4) Predictive Parsers (ASU 4.4 ...

Top-down Parsing (ASU Ch 4.4) Predictive Parsers (ASU 4.4 ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Top</strong>-<strong>down</strong> <strong>Parsing</strong> (<strong>ASU</strong> <strong>Ch</strong> <strong>4.4</strong>)<br />

• <strong>Predictive</strong> Parser (PP) (non-backtracking)<br />

• define class of LL(1) - PP can be generated automatically<br />

• top-<strong>down</strong> parsing<br />

– recursive descent (RD)<br />

– non-recursive descent (NRD)<br />

• backtracking example (<strong>ASU</strong> Fig. 4.9)<br />

S cAd A ab | a w = cad (input string)<br />

S cAd cabd<br />

(no match - backtrack!)<br />

S cAd cad<br />

(match - OK)<br />

exercise: draw the parse trees (or see <strong>ASU</strong> Fig 4.9)<br />

1<br />

14/10/2014 DFR - CC - TDP


<strong>Predictive</strong> <strong>Parsers</strong> (<strong>ASU</strong> <strong>4.4</strong>)<br />

• <strong>Predictive</strong> <strong>Parsers</strong> (PP) (no backtracking)<br />

– eliminate left recursion<br />

– left factor the resulting grammar<br />

– implementation: recursive descent (RDPP)<br />

• Transition Diagrams for PPs<br />

– for each NT in grammar G<br />

• create an initial (IS) and final state (FS)<br />

• for each P: A X 1 X 2 X 3 …X n create a path from IS to FS with<br />

edges labelled X 1 X 2 X 3 … X n<br />

• see <strong>ASU</strong> pp 183-185 for examples (Figs 4.10, 4.11, 4.12)<br />

2<br />

14/10/2014 DFR - CC - TDP


Transition Diagrams (<strong>ASU</strong> pp183-185)<br />

• Transition diagram behaviour<br />

– begin with IS for start symbol S (in G)<br />

– for state s a t<br />

• if input (from w) is a (a is T), read input symbol & go to state t<br />

– for state s A t<br />

• A is NT, go to IS for A<br />

• effectively A has been read in from the input stream (w)<br />

• on FS for A, go to state t<br />

– for state s ¤ t (¤ = empty)<br />

• go to state t (do not read from the input stream)<br />

– implementation<br />

• match Ts against the input stream (w)<br />

• for an NT, call a procedure (possibly recursively)<br />

3<br />

14/10/2014 DFR - CC - TDP


Transition Diagrams: example (<strong>ASU</strong> pp183-185)<br />

E TE’<br />

E’ +TE’ | ¤<br />

T FT’<br />

T’ *FT’ | ¤<br />

F (E) | id<br />

T E’<br />

0 1 2<br />

3<br />

+<br />

4<br />

T<br />

5<br />

E’<br />

6<br />

¤<br />

7<br />

F<br />

8<br />

T’<br />

9<br />

10<br />

*<br />

11<br />

F<br />

12<br />

T’<br />

13<br />

¤<br />

14<br />

(<br />

15<br />

E<br />

16<br />

)<br />

17<br />

id<br />

4<br />

14/10/2014 DFR - CC - TDP


Recursive Descent example<br />

L ::= E { ; E } ’;’ ’\n’<br />

E ::= E ’+’ T | T<br />

T ::= T ’*’ F | F<br />

E<br />

L<br />

;<br />

\n<br />

F ::= ’(’ E ’)’ | digit<br />

right recursive<br />

T<br />

E’<br />

L ::= E { ; E } ; ’\n’<br />

E ::= T E’<br />

F<br />

T’<br />

+ T<br />

E’<br />

E’ ::= ¤ | ’+’ T E’<br />

T ::= F T’<br />

F T’<br />

T’ ::= ¤ | ’*’ F T’<br />

F ::= ’(’ E ’)’ | digit<br />

2 e + 2<br />

e<br />

e<br />

;<br />

\n<br />

5<br />

14/10/2014 DFR - CC - TDP


Transition Diagrams: example (<strong>ASU</strong> pp183-185)<br />

• Optimisation<br />

E E+T | T<br />

T T*F | F<br />

F (E) | id<br />

+<br />

0<br />

T<br />

3<br />

¤<br />

6<br />

*<br />

F ¤<br />

7 8 13<br />

14<br />

(<br />

15<br />

E<br />

16<br />

)<br />

17<br />

id<br />

6<br />

14/10/2014 DFR - CC - TDP


Non Recursive <strong>Predictive</strong> Parser (NRPP) (<strong>ASU</strong><br />

<strong>Ch</strong> <strong>4.4</strong>, Fig 4.13)<br />

• table driven + stack<br />

– $ : end-of-input / bos<br />

– M[A, a] - A NT, a T or $<br />

– X is symbol on tos<br />

• actions<br />

– X = a = $ : halt, success<br />

– X = a != $ : pop X,<br />

advance input pointer<br />

– X is NT : see M[X, a]<br />

stack<br />

X<br />

Y<br />

Z<br />

$<br />

• X => UVW (expand X on stack)<br />

• X is error - call error recovery<br />

input (w)<br />

a + b $<br />

PP program<br />

parsing table<br />

M<br />

output<br />

7<br />

14/10/2014 DFR - CC - TDP


First and Follow Sets (<strong>ASU</strong> <strong>Ch</strong> <strong>4.4</strong>)<br />

• first(X)<br />

– if X is T first(X) is { X }<br />

– if X ¤ in P first(X) += ¤ (¤ = empty)<br />

– if X is NT and X Y 1 Y 2 Y 3 …Y n in P<br />

• first(X) += a if a in first(Y i ) and ¤ in first(Y 1 ) … first(Y i-1 )<br />

– i.e. Y 1 Y 2 Y 3 …Y i-1 ¤<br />

– if ¤ in first(Y 1 ) … first(Y k ) first(X) += ¤<br />

– first(X) += first(Y 1 ) if not Y 1 ¤<br />

– first(X) += first(Y 2 ) if Y 1 ¤ and not Y 2 ¤<br />

– and so on ...<br />

– continued on next slide<br />

8<br />

14/10/2014 DFR - CC - TDP


First and Follow Sets (<strong>ASU</strong> <strong>Ch</strong> <strong>4.4</strong>)<br />

• first(X) continued (the implications of ¤)<br />

– for any string X 1 X 2 X 3 … X n<br />

• first( X 1 X 2 X 3 … X n ) += non ¤ symbols in first( X 1 )<br />

• if ¤ in first( X 1 )<br />

– first( X 1 X 2 X 3 … X n ) += non ¤ symbols in first( X 2 )<br />

• if ¤ in first( X 1 ) and ¤ in first( X 2 )<br />

– first( X 1 X 2 X 3 … X n ) += non ¤ symbols in first( X 3 )<br />

– and so on<br />

• if ¤ in first( X i ) i = 1..n, first( X 1 X 2 X 3 … X n ) += ¤<br />

9<br />

14/10/2014 DFR - CC - TDP


First: example (<strong>ASU</strong> Ex 4.17)<br />

E TE’<br />

E’ +TE’ | ¤<br />

T FT’<br />

T’ *FT’ | ¤<br />

F (E) | id<br />

• first(E) = first(T) = first(F) = {(, id}<br />

– from F (E) | id;<br />

T FT’;<br />

E TE’<br />

• first(E’) = { +, ¤ }<br />

– from E’ +TE’ | ¤<br />

• + is a terminal<br />

• E’ ¤<br />

• first(T’) = { *, ¤ }<br />

– from T’ *FT’ | ¤<br />

• * is a terminal<br />

• T’ ¤<br />

10<br />

14/10/2014 DFR - CC - TDP


First and Follow Sets (<strong>ASU</strong> <strong>Ch</strong> <strong>4.4</strong>)<br />

• follow(X)<br />

– follow(S) += $ S: start symbol / $: end of input / S = w$<br />

– if A => a B β sibling<br />

• follow(B) += first(β) except for ¤<br />

– if in P : A => aB or A => aBβ and ¤ in first (β)<br />

• follow(B) += follow(A)<br />

(¤=empty)<br />

parent<br />

11<br />

14/10/2014 DFR - CC - TDP


Follow: example (<strong>ASU</strong> <strong>Ch</strong> <strong>4.4</strong>)<br />

E TE’<br />

E’ +TE’ | ¤<br />

T FT’<br />

T’ *FT’ | ¤<br />

F (E) | id<br />

first(E) = { (, id }<br />

first (E’) = { +, ¤ }<br />

first(T) = { (, id }<br />

first (T’) = { *, ¤ }<br />

first(F) = { (, id }<br />

follow(E) = { ), $ } F (E) and E is S<br />

follow(E’) = { ), $ } E TE’; rule A aB & f(E’) += f(E)<br />

follow(T) = { +, ), $ } E’+TE’|¤ f(T) += f(E’); ETE’ f(T) +=first(E’) -¤<br />

follow(T’) = { +, ), $ } TFT’ & rule AaB f(T’) += f(T)<br />

follow(F) = { +, *, ), $ } TFT’ f(F) += f(T); T’*FT’ f(F) += first(T’) -¤<br />

12<br />

14/10/2014 DFR - CC - TDP


<strong>Parsing</strong> Table M (<strong>ASU</strong> Fig 4.15 p188)<br />

NTs id + * ( ) $<br />

E<br />

E’<br />

T<br />

T’<br />

F<br />

ETE’<br />

TFT’<br />

Fid<br />

error state<br />

E’+TE’<br />

T’¤<br />

Input Symbol (Ts)<br />

T’*FT’<br />

ETE’<br />

TFT’<br />

F(E)<br />

E’¤<br />

T’¤<br />

E’¤<br />

T’¤<br />

13<br />

14/10/2014 DFR - CC - TDP


Constructing the <strong>Parsing</strong> Table<br />

• principles<br />

– if B β in P and b in first(β)<br />

• expand B by β when the current input symbol is b<br />

– if β ¤ or β * ¤<br />

• expand B by β if the input symbol is in follow(Β)<br />

• or $ reached on input and $ in follow(β)<br />

• construction<br />

– for each P: B => β do<br />

• for each T b in first(β); M[B, b] += B β<br />

• if ¤ in first(β) M[B, c] += B β for each T c in follow(Β)<br />

• if ¤ in first(β) and $ in follow(β) M[B, $] += B β<br />

• make each undefined entry in M an error<br />

14<br />

14/10/2014 DFR - CC - TDP


Constructing the <strong>Parsing</strong> Table: Details<br />

E TE’ add ETE’ to M[E, id] id in first(TE’)<br />

add ETE’ to M[E, ( ]<br />

( in first(TE’)<br />

E’ +TE’ | ¤ add E’+TE’ to M[E’, +] + in first(+TE’)<br />

add E’ ¤ to M[E’, ) ]<br />

¤ in first(E’)<br />

) in follow(E’)<br />

add E’ ¤ to M[E’, $ ]<br />

¤ in first(E’)<br />

$ in follow(E’)<br />

T FT’ add TFT’ to M[E, id] id in first(FT’)<br />

add TFT’ to M[E, ( ]<br />

( in first(FT’)<br />

15<br />

first: E = { (, id } E’ = { +, ¤} T ={ (, id } T’ = { *, ¤} F ={ (, id }<br />

follow: E = { ), $ } E’ = { ), $ } T = { +, ), $ } T’ = { +, ), $ } F = { +, *, ), $ }<br />

14/10/2014 DFR - CC - TDP


Constructing the <strong>Parsing</strong> Table: Details<br />

T’ *FT’ | ¤ add T’*FT’ to M[T’, *] * in first(*FT’)<br />

add T’ ¤ to M[T’, + ]<br />

¤ in first(T’)<br />

+ in follow(T’)<br />

add T’ ¤ to M[T’, ) ]<br />

¤ in first(T’)<br />

) in follow(T’)<br />

add T’ ¤ to M[T’, $ ]<br />

¤ in first(T’)<br />

$ in follow(T’)<br />

F (E) | id add F (E) to M[F, ( ] ( in first((E))<br />

add F id to M[E, id ]<br />

id in first(id)<br />

first: E = { (, id } E’ = { +, ¤} T ={ (, id } T’ = { *, ¤} F ={ (, id }<br />

follow: E = { ), $ } E’ = { ), $ } T = { +, ), $ } T’ = { +, ), $ } F = { +, *, ), $ }<br />

16<br />

14/10/2014 DFR - CC - TDP


Non Recursive <strong>Parsing</strong> Algorithm<br />

(<strong>ASU</strong> Fig 4.14)<br />

input: string w, parsing table M for grammar G<br />

output: if w in L(G), leftmost derivation of w, else error<br />

init: stack $S i/p buffer w$<br />

ip first_symbol(w)<br />

do {<br />

//ip - input pointer, X is TOS<br />

if ( X is T or $) { if (X==*ip) { pop X; ip++} else error( ) } // match<br />

else if ( M[X, *ip] == P: X => Y 1 Y 2 …Y k )<br />

{ pop X; push Y k Y k-1 …Y 1 on to stack; output P }<br />

else error( )<br />

} while ( X != $)<br />

// expand<br />

17<br />

14/10/2014 DFR - CC - TDP


Parse of id+id*id (<strong>ASU</strong> Fig 4.16)<br />

stack i/p o/p<br />

$E id+id*id$<br />

$E’T id+id*id$ E=>TE’<br />

$E’T’F id+id*id$ T=>FT’<br />

$E’T’id id+id*id$ F=>id<br />

$E’T’ +id*id$<br />

$E’ +id*id$ T’=>¤<br />

$E’T+ +id*id$ E’=>+TE’<br />

$E’T id*id$<br />

$E’T’F id*id$ T=>FT’<br />

$E’T’id id*id$ F=>id<br />

stack i/p<br />

o/p<br />

$E’T’ *id$<br />

$E’T’F* *id$<br />

$E’T’F id$<br />

$E’T’id id$ F=>id<br />

$E’T’ $<br />

$E’ $ T’=>¤<br />

$ $ E’=>¤<br />

finish - success!<br />

T’=>*FT’<br />

18<br />

14/10/2014 DFR - CC - TDP


Properties of LL(1) Grammars (<strong>ASU</strong> <strong>Ch</strong> <strong>4.4</strong>)<br />

• Left recursive or ambiguous G cannot be LL(1)<br />

• G is LL(1) iff whenever A=> α | β are 2 distinct Ps of G and<br />

1. for no T a do both α β derive strings beginning with a<br />

2. at most one of α β can derive ¤ (¤ = empty)<br />

3. if β =*=> ¤ then α does not derive any string beginning with a<br />

T in follow(A)<br />

• use left recursion elimination / left factoring to produce LL(1) G<br />

• PP used for control constructs / operator precedence for exprs<br />

• error recovery in PP<br />

– Ts, NTs made explicit in stack in PP<br />

– error => TOS T != a (next input symbol) or M[A, a] is empty<br />

– synchronising set (SySt) - panic recovery for NT A SySt += follow(A)<br />

– SySt += keywords<br />

– if TOS is unmatched T, pop, print message and continue<br />

19<br />

14/10/2014 DFR - CC - TDP


Summary<br />

• <strong>Top</strong>-<strong>down</strong> parsing<br />

– predictive parser (PP)<br />

– recursive descent (RD)<br />

– non recursive descent (NRD)<br />

– avoid backtracking<br />

• Transition Diagrams for PPS<br />

– example vs RD example<br />

• Sets First and Follow<br />

– derivation & use<br />

• Non Recursive Descent<br />

(NRD)<br />

– <strong>Parsing</strong> Table M<br />

• principles<br />

• construction<br />

– NRD algorithm<br />

– example parse id+id*id<br />

– properties of LL(1) Gs<br />

• error recovery for PPs<br />

20<br />

14/10/2014 DFR - CC - TDP

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!