Syntax - Introduction Grammar Rules - BNF Language Sentences ...

Syntax - Introduction 

Grammar Rules - 

BNF 

• Language syntax 

• ::= 

– 1950s Noam Chomsky - Context Free Grammars (CFGs) 

– 1960 Backus / Naur – BNF (Backus Naur Form) 

• EBNF Extended BNF 

– 1972 Niklaus Wirth - Syntax Diagrams (Pascal syntax graphs) 

• Konrad Zuse - Plankalkül (1940s) 

• Lexical Structure of PLs 

– keywords (reserved words) begin, end, do, while 

– literals / constants 42, “string”, (1, 2, ‘a’) 

– special symbols ‘{‘, ‘

Chomsky Classifications 

Example 

• Type 0 - no restrictions Ps of form a ::= b 

• Type 1 - context sensitive Ps of form aAb ::= apb 

• Type 2 - context free Ps of form A ::= p 

• Type 3 - finite state Ls Ps of form A ::= a | aB 

• where A, B are NTs, a, b, p are sentential forms (SFs) 

• left recursion Ps of form A ::= Ax 

• right recursion Ps of form A ::= xA 

• self embedding Ps of form A ::= xAy 

 

S 

::= | 

| 

( ) | 

::= + | - 

::= * | / 

::= | 

::= 0 | 1| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 

NT 

• NB some rules are recursive ( , ) 

T 

P 

5 

03/04/2013 DFR - PL - Syntax & Parsing 

6 


Syntax & Parse Trees 

Ambiguity, Associativity, Precedence 

• Parse tree 3+4*5 • Syntax tree 3+4*5 

 

+ 

+ 

 

* 

 

3 * 

 

3 

 

4 

 

5 

4 5 

+ 

3 * 

4 5 

* 

+ 5 

3 4 

• >1 possible parse tree => ambiguous 

grammar (NB grammar uses symbols) 

• semantic rules required 

– precedence of +, -, *, / 

– associativity, left or right 

• 8 - 4 - 2 = 2 i.e. ((8-4)-2) (l assoc) 

• 8 - 4 - 2 = 6 i.e. (8-(4-2)) (r assoc) 

• may also be solved in the grammar - see 

Pascal for e.g. 

• if-then-else is another example 

7 


8 

03/04/2013 DFR - PL - Syntax & Parsing

Syntax Diagrams (Graphs) 

Summary (so far…) 

::= + 

| 

< term > ::= < term > * 

| 

::= ( ) | 

::= 

| 

::= | 

::= A .. Z | a .. z 

::= 0 .. 9 

expr 

term 

factor 

term 

+ 

factor 

* 

( exp 

id 

) 

• BNF 

::= + | 

- | 

 

::= * | 

/ | 

< factor > 

• EBNF 

::= { (+|-) } 

::= { (*|/) < factor >} 

• L(G) where G = (S, P, NT, T) 

– S - start symbol S in NT 

– P - set of productions 

– NT - non-terminal symbols 

– T - terminal symbols 

• (E)BNF describes P (or graphs) 

• derivation (S * w) 

• parse (w * S) 

– w is a sentence in L(G) 

• semantics 

– precedence, associativity 

9 


10 


Parsing - 

parser: w true | false 

Parsing Techniques & Tools 

PL text 

(string) 

Scanner 

PL 

tokens 

Parser 

• simple parser recogniser for strings w in L 

• general parser build syntax trees (intermediate form) 

• top-down parser NTs expanded from S (start symbol) w 

• the source program is a string of symbols (from an alphabet A) 

• these symbols may be grouped as lexemes 

• the scanner converts lexemes to tokens 

• the parser processes the token stream according to P 

lexemes: index = 2 * count + 17; 

tokens: eql mulop plusop scolon 

• bottom-up parser shift/reduce symbols in w S (e.g. YACC) 

• parser generator YACC (Yet Another Compiler Compiler) 

BNF YACC Parser (Compiler) 

• recursive descent usually written by hand (simple rules) 

– NTs become procedures 

– Ts are matched by a token recogniser and the next token is read 

– may use look ahead techniques (the grammar Ps define what to expect) 

11 


12 


Some Parsing theory 

Recursive Descent Predictive Parsing 

• For a production P X ::= ... 

– define the set first(X) to be the set of tokens that X may start with 

• first(X) for any terminal is the terminal itself - first(+) = {+} 

– for a recursive descent predictive parser (RDPP) and a rule in P 

A ::= B | C | D we require that first(B), first(C) and first(D) are 

disjoint. First(A) must also contain the tokens in first(B), first(C) 

and first(D) 

– if the parser uses lookahead, it can predict which rule is required 

by looking at the input token 

– e.g. ::= id | ( ) 

first() = { id, ( } 

• Look at the grammar NT write a procedure 

T match and get next token 

• e.g. ::= { + }* 

procedure expr(); 

{ term( ); while token == ‘+’ do { get_token( ); term( ); }} 

• e.g. ::= if then { else } 

procedure if_stat(); 

{ if token != ‘if’ then error( ); else { get_token( ); cond( ); 

if token != ‘then’ then error( ); else { get_token( ); stat( ); 

if token == ‘else’ {get_token( ); stat( ); } } } } 

13 


14 


G = (S, P, NT, T) - a picture 

Example Grammar 

w 

S 

a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11 a 12 a 13 $ 

• there are two “cursors” - one in the parse tree and one in the input 

string (token stream) w 

• an error => these cursors are out of synchronisation 

• Original Grammar 

line ::= expr { ‘;’ expr} ‘;’ ‘\n’ 

expr ::= expr ‘+’ term | term 

term ::= term ‘*’ factor | 

factor 

factor ::= ‘(‘ expr ‘)’ | DIGIT 

• RD does not work for left 

recursive grammars !!! 

• Right Recursive Grammar 

line ::= expr { ‘;’ expr} ; ‘\n’ 

expr ::= term R1 

R1 ::= e | ‘+’ term R1 

term ::= factor R2 

R2 ::= e | ‘*’ factor R2 

factor ::= ‘(‘ expr ‘)’ | DIGIT 

(e = empty) 

15 


16 


Example Code 

Example Code 

factor 

::= ‘(‘ expr ‘)’ | DIGIT 

R2 

::= e | ‘*’factor R2 

void factor() 

{ int value; 

if (lookahead == ‘(’ ) 

{ match( ‘(’ ); expr( ); 

if (lookahead == ‘)’ ) match( ‘)’ ); 

else error("*** ‘)’ EXPECTED *** (in procedure factor)"); 

return; 

} 

else if (lookahead == DIGIT) { match(lookahead); return; } 

else error("*** UNEXPECTED SYMBOL *** (in procedure factor)"); 

lookahead = yylex(); /* skip over symbol - no synch !!! */ 

}; 

void R2( ) 

{ if (lookahead == '*') { match('*'); factor( ); R2( ); }; }; 

R1 ::= e | ‘+’term R1 

void R1( ) 

{ if (lookahead == '+') { match('+'); term( ); R1( ); }; }; 

17 


18 


Example Code 

Parsing - summary 

• Several parsing techniques exist - we will use RDPP 

void term( ) 

{factor( ); R2( ); }; 

void expr( ) 

{term( ); R1( ); }; 

term 

expr 

::= factor R2 

::= term R1 

• given a grammar G, P implies a (family of) parse tree(s) 

• the source code is a string w ( token stream) 

• RDPP involves matching Ts in w with the expected Ts in the parse tree (these 

Ts will be leaf nodes in the parse tree) 

• parser code: NT procedure; T match & get_token 

• for P A ::= B | C | D, first for B, C, D must be disjoint 

• the C example above was a parser - (see also interpreter code) 

– a certain amount of semantics is built in to the interpreter 

19 


20

Syntax - Introduction Grammar Rules - BNF Language Sentences ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?