29.04.2015 Views

Syntax - Introduction Grammar Rules - BNF Language Sentences ...

Syntax - Introduction Grammar Rules - BNF Language Sentences ...

Syntax - Introduction Grammar Rules - BNF Language Sentences ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Syntax</strong> - <strong>Introduction</strong><br />

<strong>Grammar</strong> <strong>Rules</strong> -<br />

<strong>BNF</strong><br />

• <strong>Language</strong> syntax<br />

• ::= <br />

– 1950s Noam Chomsky - Context Free <strong>Grammar</strong>s (CFGs)<br />

– 1960 Backus / Naur – <strong>BNF</strong> (Backus Naur Form)<br />

• E<strong>BNF</strong> Extended <strong>BNF</strong><br />

– 1972 Niklaus Wirth - <strong>Syntax</strong> Diagrams (Pascal syntax graphs)<br />

• Konrad Zuse - Plankalkül (1940s)<br />

• Lexical Structure of PLs<br />

– keywords (reserved words) begin, end, do, while<br />

– literals / constants 42, “string”, (1, 2, ‘a’)<br />

– special symbols ‘{‘, ‘


Chomsky Classifications<br />

Example<br />

• Type 0 - no restrictions Ps of form a ::= b<br />

• Type 1 - context sensitive Ps of form aAb ::= apb<br />

• Type 2 - context free Ps of form A ::= p<br />

• Type 3 - finite state Ls Ps of form A ::= a | aB<br />

• where A, B are NTs, a, b, p are sentential forms (SFs)<br />

• left recursion Ps of form A ::= Ax<br />

• right recursion Ps of form A ::= xA<br />

• self embedding Ps of form A ::= xAy<br />

<br />

S<br />

::= |<br />

|<br />

( ) | <br />

::= + | -<br />

::= * | /<br />

::= | <br />

::= 0 | 1| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9<br />

NT<br />

• NB some rules are recursive ( , )<br />

T<br />

P<br />

5<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

6<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

<strong>Syntax</strong> & Parse Trees<br />

Ambiguity, Associativity, Precedence<br />

• Parse tree 3+4*5 • <strong>Syntax</strong> tree 3+4*5<br />

<br />

+ <br />

+<br />

<br />

* <br />

<br />

3 *<br />

<br />

3<br />

<br />

4<br />

<br />

5<br />

4 5<br />

+<br />

3 *<br />

4 5<br />

*<br />

+ 5<br />

3 4<br />

• >1 possible parse tree => ambiguous<br />

grammar (NB grammar uses symbols)<br />

• semantic rules required<br />

– precedence of +, -, *, /<br />

– associativity, left or right<br />

• 8 - 4 - 2 = 2 i.e. ((8-4)-2) (l assoc)<br />

• 8 - 4 - 2 = 6 i.e. (8-(4-2)) (r assoc)<br />

• may also be solved in the grammar - see<br />

Pascal for e.g.<br />

• if-then-else is another example<br />

7<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

8<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing


<strong>Syntax</strong> Diagrams (Graphs)<br />

Summary (so far…)<br />

::= + <br />

| <br />

< term > ::= < term > * <br />

| <br />

::= ( ) | <br />

::= <br />

| <br />

::= | <br />

::= A .. Z | a .. z<br />

::= 0 .. 9<br />

expr<br />

term<br />

factor<br />

term<br />

+<br />

factor<br />

*<br />

( exp<br />

id<br />

)<br />

• <strong>BNF</strong><br />

::= + |<br />

- |<br />

<br />

::= * |<br />

/ |<br />

< factor ><br />

• E<strong>BNF</strong><br />

::= { (+|-) }<br />

::= { (*|/) < factor >}<br />

• L(G) where G = (S, P, NT, T)<br />

– S - start symbol S in NT<br />

– P - set of productions<br />

– NT - non-terminal symbols<br />

– T - terminal symbols<br />

• (E)<strong>BNF</strong> describes P (or graphs)<br />

• derivation (S * w)<br />

• parse (w * S)<br />

– w is a sentence in L(G)<br />

• semantics<br />

– precedence, associativity<br />

9<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

10<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

Parsing -<br />

parser: w true | false<br />

Parsing Techniques & Tools<br />

PL text<br />

(string)<br />

Scanner<br />

PL<br />

tokens<br />

Parser<br />

• simple parser recogniser for strings w in L<br />

• general parser build syntax trees (intermediate form)<br />

• top-down parser NTs expanded from S (start symbol) w<br />

• the source program is a string of symbols (from an alphabet A)<br />

• these symbols may be grouped as lexemes<br />

• the scanner converts lexemes to tokens<br />

• the parser processes the token stream according to P<br />

lexemes: index = 2 * count + 17;<br />

tokens: eql mulop plusop scolon<br />

• bottom-up parser shift/reduce symbols in w S (e.g. YACC)<br />

• parser generator YACC (Yet Another Compiler Compiler)<br />

<strong>BNF</strong> YACC Parser (Compiler)<br />

• recursive descent usually written by hand (simple rules)<br />

– NTs become procedures<br />

– Ts are matched by a token recogniser and the next token is read<br />

– may use look ahead techniques (the grammar Ps define what to expect)<br />

11<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

12<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing


Some Parsing theory<br />

Recursive Descent Predictive Parsing<br />

• For a production P X ::= ...<br />

– define the set first(X) to be the set of tokens that X may start with<br />

• first(X) for any terminal is the terminal itself - first(+) = {+}<br />

– for a recursive descent predictive parser (RDPP) and a rule in P<br />

A ::= B | C | D we require that first(B), first(C) and first(D) are<br />

disjoint. First(A) must also contain the tokens in first(B), first(C)<br />

and first(D)<br />

– if the parser uses lookahead, it can predict which rule is required<br />

by looking at the input token<br />

– e.g. ::= id | ( )<br />

first() = { id, ( }<br />

• Look at the grammar NT write a procedure<br />

T match and get next token<br />

• e.g. ::= { + }*<br />

procedure expr();<br />

{ term( ); while token == ‘+’ do { get_token( ); term( ); }}<br />

• e.g. ::= if then { else }<br />

procedure if_stat();<br />

{ if token != ‘if’ then error( ); else { get_token( ); cond( );<br />

if token != ‘then’ then error( ); else { get_token( ); stat( );<br />

if token == ‘else’ {get_token( ); stat( ); } } } }<br />

13<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

14<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

G = (S, P, NT, T) - a picture<br />

Example <strong>Grammar</strong><br />

w<br />

S<br />

a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11 a 12 a 13 $<br />

• there are two “cursors” - one in the parse tree and one in the input<br />

string (token stream) w<br />

• an error => these cursors are out of synchronisation<br />

• Original <strong>Grammar</strong><br />

line ::= expr { ‘;’ expr} ‘;’ ‘\n’<br />

expr ::= expr ‘+’ term | term<br />

term ::= term ‘*’ factor |<br />

factor<br />

factor ::= ‘(‘ expr ‘)’ | DIGIT<br />

• RD does not work for left<br />

recursive grammars !!!<br />

• Right Recursive <strong>Grammar</strong><br />

line ::= expr { ‘;’ expr} ; ‘\n’<br />

expr ::= term R1<br />

R1 ::= e | ‘+’ term R1<br />

term ::= factor R2<br />

R2 ::= e | ‘*’ factor R2<br />

factor ::= ‘(‘ expr ‘)’ | DIGIT<br />

(e = empty)<br />

15<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

16<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing


Example Code<br />

Example Code<br />

factor<br />

::= ‘(‘ expr ‘)’ | DIGIT<br />

R2<br />

::= e | ‘*’factor R2<br />

void factor()<br />

{ int value;<br />

if (lookahead == ‘(’ )<br />

{ match( ‘(’ ); expr( );<br />

if (lookahead == ‘)’ ) match( ‘)’ );<br />

else error("*** ‘)’ EXPECTED *** (in procedure factor)");<br />

return;<br />

}<br />

else if (lookahead == DIGIT) { match(lookahead); return; }<br />

else error("*** UNEXPECTED SYMBOL *** (in procedure factor)");<br />

lookahead = yylex(); /* skip over symbol - no synch !!! */<br />

};<br />

void R2( )<br />

{ if (lookahead == '*') { match('*'); factor( ); R2( ); }; };<br />

R1 ::= e | ‘+’term R1<br />

void R1( )<br />

{ if (lookahead == '+') { match('+'); term( ); R1( ); }; };<br />

17<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

18<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

Example Code<br />

Parsing - summary<br />

• Several parsing techniques exist - we will use RDPP<br />

void term( )<br />

{factor( ); R2( ); };<br />

void expr( )<br />

{term( ); R1( ); };<br />

term<br />

expr<br />

::= factor R2<br />

::= term R1<br />

• given a grammar G, P implies a (family of) parse tree(s)<br />

• the source code is a string w ( token stream)<br />

• RDPP involves matching Ts in w with the expected Ts in the parse tree (these<br />

Ts will be leaf nodes in the parse tree)<br />

• parser code: NT procedure; T match & get_token<br />

• for P A ::= B | C | D, first for B, C, D must be disjoint<br />

• the C example above was a parser - (see also interpreter code)<br />

– a certain amount of semantics is built in to the interpreter<br />

19<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />

20<br />

03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!