Syntax - Introduction Grammar Rules - BNF Language Sentences ...
Syntax - Introduction Grammar Rules - BNF Language Sentences ...
Syntax - Introduction Grammar Rules - BNF Language Sentences ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Syntax</strong> - <strong>Introduction</strong><br />
<strong>Grammar</strong> <strong>Rules</strong> -<br />
<strong>BNF</strong><br />
• <strong>Language</strong> syntax<br />
• ::= <br />
– 1950s Noam Chomsky - Context Free <strong>Grammar</strong>s (CFGs)<br />
– 1960 Backus / Naur – <strong>BNF</strong> (Backus Naur Form)<br />
• E<strong>BNF</strong> Extended <strong>BNF</strong><br />
– 1972 Niklaus Wirth - <strong>Syntax</strong> Diagrams (Pascal syntax graphs)<br />
• Konrad Zuse - Plankalkül (1940s)<br />
• Lexical Structure of PLs<br />
– keywords (reserved words) begin, end, do, while<br />
– literals / constants 42, “string”, (1, 2, ‘a’)<br />
– special symbols ‘{‘, ‘
Chomsky Classifications<br />
Example<br />
• Type 0 - no restrictions Ps of form a ::= b<br />
• Type 1 - context sensitive Ps of form aAb ::= apb<br />
• Type 2 - context free Ps of form A ::= p<br />
• Type 3 - finite state Ls Ps of form A ::= a | aB<br />
• where A, B are NTs, a, b, p are sentential forms (SFs)<br />
• left recursion Ps of form A ::= Ax<br />
• right recursion Ps of form A ::= xA<br />
• self embedding Ps of form A ::= xAy<br />
<br />
S<br />
::= |<br />
|<br />
( ) | <br />
::= + | -<br />
::= * | /<br />
::= | <br />
::= 0 | 1| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9<br />
NT<br />
• NB some rules are recursive ( , )<br />
T<br />
P<br />
5<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
6<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
<strong>Syntax</strong> & Parse Trees<br />
Ambiguity, Associativity, Precedence<br />
• Parse tree 3+4*5 • <strong>Syntax</strong> tree 3+4*5<br />
<br />
+ <br />
+<br />
<br />
* <br />
<br />
3 *<br />
<br />
3<br />
<br />
4<br />
<br />
5<br />
4 5<br />
+<br />
3 *<br />
4 5<br />
*<br />
+ 5<br />
3 4<br />
• >1 possible parse tree => ambiguous<br />
grammar (NB grammar uses symbols)<br />
• semantic rules required<br />
– precedence of +, -, *, /<br />
– associativity, left or right<br />
• 8 - 4 - 2 = 2 i.e. ((8-4)-2) (l assoc)<br />
• 8 - 4 - 2 = 6 i.e. (8-(4-2)) (r assoc)<br />
• may also be solved in the grammar - see<br />
Pascal for e.g.<br />
• if-then-else is another example<br />
7<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
8<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing
<strong>Syntax</strong> Diagrams (Graphs)<br />
Summary (so far…)<br />
::= + <br />
| <br />
< term > ::= < term > * <br />
| <br />
::= ( ) | <br />
::= <br />
| <br />
::= | <br />
::= A .. Z | a .. z<br />
::= 0 .. 9<br />
expr<br />
term<br />
factor<br />
term<br />
+<br />
factor<br />
*<br />
( exp<br />
id<br />
)<br />
• <strong>BNF</strong><br />
::= + |<br />
- |<br />
<br />
::= * |<br />
/ |<br />
< factor ><br />
• E<strong>BNF</strong><br />
::= { (+|-) }<br />
::= { (*|/) < factor >}<br />
• L(G) where G = (S, P, NT, T)<br />
– S - start symbol S in NT<br />
– P - set of productions<br />
– NT - non-terminal symbols<br />
– T - terminal symbols<br />
• (E)<strong>BNF</strong> describes P (or graphs)<br />
• derivation (S * w)<br />
• parse (w * S)<br />
– w is a sentence in L(G)<br />
• semantics<br />
– precedence, associativity<br />
9<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
10<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
Parsing -<br />
parser: w true | false<br />
Parsing Techniques & Tools<br />
PL text<br />
(string)<br />
Scanner<br />
PL<br />
tokens<br />
Parser<br />
• simple parser recogniser for strings w in L<br />
• general parser build syntax trees (intermediate form)<br />
• top-down parser NTs expanded from S (start symbol) w<br />
• the source program is a string of symbols (from an alphabet A)<br />
• these symbols may be grouped as lexemes<br />
• the scanner converts lexemes to tokens<br />
• the parser processes the token stream according to P<br />
lexemes: index = 2 * count + 17;<br />
tokens: eql mulop plusop scolon<br />
• bottom-up parser shift/reduce symbols in w S (e.g. YACC)<br />
• parser generator YACC (Yet Another Compiler Compiler)<br />
<strong>BNF</strong> YACC Parser (Compiler)<br />
• recursive descent usually written by hand (simple rules)<br />
– NTs become procedures<br />
– Ts are matched by a token recogniser and the next token is read<br />
– may use look ahead techniques (the grammar Ps define what to expect)<br />
11<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
12<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing
Some Parsing theory<br />
Recursive Descent Predictive Parsing<br />
• For a production P X ::= ...<br />
– define the set first(X) to be the set of tokens that X may start with<br />
• first(X) for any terminal is the terminal itself - first(+) = {+}<br />
– for a recursive descent predictive parser (RDPP) and a rule in P<br />
A ::= B | C | D we require that first(B), first(C) and first(D) are<br />
disjoint. First(A) must also contain the tokens in first(B), first(C)<br />
and first(D)<br />
– if the parser uses lookahead, it can predict which rule is required<br />
by looking at the input token<br />
– e.g. ::= id | ( )<br />
first() = { id, ( }<br />
• Look at the grammar NT write a procedure<br />
T match and get next token<br />
• e.g. ::= { + }*<br />
procedure expr();<br />
{ term( ); while token == ‘+’ do { get_token( ); term( ); }}<br />
• e.g. ::= if then { else }<br />
procedure if_stat();<br />
{ if token != ‘if’ then error( ); else { get_token( ); cond( );<br />
if token != ‘then’ then error( ); else { get_token( ); stat( );<br />
if token == ‘else’ {get_token( ); stat( ); } } } }<br />
13<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
14<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
G = (S, P, NT, T) - a picture<br />
Example <strong>Grammar</strong><br />
w<br />
S<br />
a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11 a 12 a 13 $<br />
• there are two “cursors” - one in the parse tree and one in the input<br />
string (token stream) w<br />
• an error => these cursors are out of synchronisation<br />
• Original <strong>Grammar</strong><br />
line ::= expr { ‘;’ expr} ‘;’ ‘\n’<br />
expr ::= expr ‘+’ term | term<br />
term ::= term ‘*’ factor |<br />
factor<br />
factor ::= ‘(‘ expr ‘)’ | DIGIT<br />
• RD does not work for left<br />
recursive grammars !!!<br />
• Right Recursive <strong>Grammar</strong><br />
line ::= expr { ‘;’ expr} ; ‘\n’<br />
expr ::= term R1<br />
R1 ::= e | ‘+’ term R1<br />
term ::= factor R2<br />
R2 ::= e | ‘*’ factor R2<br />
factor ::= ‘(‘ expr ‘)’ | DIGIT<br />
(e = empty)<br />
15<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
16<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing
Example Code<br />
Example Code<br />
factor<br />
::= ‘(‘ expr ‘)’ | DIGIT<br />
R2<br />
::= e | ‘*’factor R2<br />
void factor()<br />
{ int value;<br />
if (lookahead == ‘(’ )<br />
{ match( ‘(’ ); expr( );<br />
if (lookahead == ‘)’ ) match( ‘)’ );<br />
else error("*** ‘)’ EXPECTED *** (in procedure factor)");<br />
return;<br />
}<br />
else if (lookahead == DIGIT) { match(lookahead); return; }<br />
else error("*** UNEXPECTED SYMBOL *** (in procedure factor)");<br />
lookahead = yylex(); /* skip over symbol - no synch !!! */<br />
};<br />
void R2( )<br />
{ if (lookahead == '*') { match('*'); factor( ); R2( ); }; };<br />
R1 ::= e | ‘+’term R1<br />
void R1( )<br />
{ if (lookahead == '+') { match('+'); term( ); R1( ); }; };<br />
17<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
18<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
Example Code<br />
Parsing - summary<br />
• Several parsing techniques exist - we will use RDPP<br />
void term( )<br />
{factor( ); R2( ); };<br />
void expr( )<br />
{term( ); R1( ); };<br />
term<br />
expr<br />
::= factor R2<br />
::= term R1<br />
• given a grammar G, P implies a (family of) parse tree(s)<br />
• the source code is a string w ( token stream)<br />
• RDPP involves matching Ts in w with the expected Ts in the parse tree (these<br />
Ts will be leaf nodes in the parse tree)<br />
• parser code: NT procedure; T match & get_token<br />
• for P A ::= B | C | D, first for B, C, D must be disjoint<br />
• the C example above was a parser - (see also interpreter code)<br />
– a certain amount of semantics is built in to the interpreter<br />
19<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing<br />
20<br />
03/04/2013 DFR - PL - <strong>Syntax</strong> & Parsing