Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Language Translation for <strong>Turtle</strong> <strong>Graphics</strong><br />
CS 355<br />
January 7, 2007<br />
1. “<strong>Turtle</strong> <strong>Graphics</strong>” Language<br />
(a) Simple (ambiguous) grammar<br />
(b) Example turtle program:<br />
program → stmt list<br />
stmt list → stmt list stmt | stmt<br />
stmt → assignment | action<br />
assignment → IDENT ASSIGN expr<br />
expr → expr + expr | expr − expr<br />
→<br />
→<br />
expr ∗ expr | expr / expr<br />
− expr | + expr<br />
→ ( expr )<br />
→<br />
IDENT | REAL<br />
action → HOME | PENUP | PENDOWN<br />
angle := 60 # interior angle for hexagon<br />
side := 100 # size of one side of hexagon<br />
PENDOWN # begin drawing<br />
FORWARD side<br />
RIGHT angle<br />
FORWARD side<br />
RIGHT angle<br />
FORWARD side<br />
RIGHT angle<br />
FORWARD side<br />
RIGHT angle<br />
FORWARD side<br />
RIGHT angle<br />
FORWARD side<br />
RIGHT angle<br />
PENUP # end drawing<br />
(c) Lexical Elements<br />
→<br />
→<br />
FORWARD expr | RIGHT expr | LEFT expr<br />
PUSHSTATE | POPSTATE<br />
comments: # ignore remainder of line<br />
reserved words: HOME, PENUP, PENDOWN, FORWARD, RIGHT, LEFT, PUSHSTATE, POPSTATE<br />
single character tokens: +,-,*,/,(,)<br />
real numbers: REAL, regular expression: [0-9]+([\.][0-9]*)?<br />
identifiers: IDENT, regular expression: [a-zA-Z\_][a-zA-Z0-9\_]*<br />
1
2. Scanner<br />
assignment operator: ASSIGN, :=<br />
enum {REAL, IDENT, ASSIGN, HOME, PENUP, ..., POPSTATE}; /* token constants */<br />
typedef union { /* hold’s "value" associated with certain tokens */<br />
char *s; /* IDENT string */<br />
float f; /* REAL value */<br />
} LVAL;<br />
int nextToken(LVAL *lval) { /* returns next token / value */<br />
top:<br />
eat whitespace;<br />
if EOF return 0; /* no more tokens */<br />
if next char a digit {<br />
scan real number;<br />
lval->f = convert string to number;<br />
return REAL;<br />
}<br />
if next char alphabetic {<br />
scan alphanumeric string;<br />
if string is a reserved word<br />
return reserved word token<br />
else {<br />
lval->s = string; /* allocate or reuse from "string pool" */<br />
return IDENT;<br />
}<br />
}<br />
if next char is ’:’ {<br />
if next char not ’=’ error;<br />
return ASSIGN;<br />
}<br />
if next char in {+, -, *, (, )}<br />
return char;<br />
if next char ’#’ {<br />
eat chars to end of line;<br />
goto top;<br />
}<br />
error;<br />
}<br />
2
3. Symbol Table<br />
symtab<br />
"side"<br />
100<br />
"angle"<br />
60<br />
symbolLookup(symbol) {<br />
scan symtab for symbol;<br />
if found<br />
return symtab entry;<br />
else {<br />
insert (symbol, value=0) into table;<br />
return newly created entry;<br />
}<br />
}<br />
Note that we do not require variables to be declared before use. The first time a variable is referenced, it<br />
is inserted into the symbol table with a value of 0.<br />
4. Create unambiguous grammar using normal precedence and associativity rules.<br />
expr → expr + term | expr − term | term<br />
term → term ∗ factor | term / factor | factor<br />
factor → − factor | + factor<br />
→ ( expr )<br />
→<br />
• Note that grammar is left recursive.<br />
IDENT | REAL<br />
• If grammar had a dangling else problem, we would have to fix that too (we’ll add more to the language<br />
later).<br />
5. LL(k) parsing<br />
• Goal is to produce a leftmost derivation.<br />
• k is the number of tokens we can “look ahead” to predict which productions to apply.<br />
• Top down approach : begin with start symbol and try to derive input string.<br />
• Predictive : We predict which production to apply based on “look ahead” mechanism.<br />
• LL(1) parsers may use a parse table:<br />
– rows indexed by non-terminals<br />
– columns indexed by terminals (i.e. tokens)<br />
– each entry tells parses which production to apply based on the left-most non-terminal and the<br />
look-ahead token.<br />
• Recursive descent parser<br />
– Each non-terminal is mapped to a subroutine.<br />
– The right hand side (RHS) of the corresponding production dictates the body of the subroutine.<br />
– Non-terminals on the RHS are mapped to (possible recursive or co-recursive) subroutine calls.<br />
– Terminals on the RHS represent tokens that must be matched.<br />
– For non-terminals that appear on the LHS of more than one production, the appropriate production<br />
to use is predicted via the look-ahead mechasnism.<br />
– LL(1) grammars are typically used.<br />
3
– The grammar can not be left recursive.<br />
6. Unambiguous, non-left-recursive grammar for <strong>Turtle</strong> <strong>Graphics</strong><br />
• $ is EOF marker<br />
program → stmt list $<br />
stmt list → stmt {stmt}<br />
stmt → assignment | action<br />
assignment → IDENT ASSIGN expr<br />
expr → term {(+|−) term}<br />
term → factor {(∗|/) factor}<br />
factor → − factor | + factor | ( expr ) | IDENT | REAL<br />
action → HOME | PENUP | PENDOWN<br />
→<br />
→<br />
• {} is EBNF shorthand for “0 or more”<br />
7. <strong>Turtle</strong> State<br />
x,y : coordinates of turtle (initially x = y = 0)<br />
dir : direction of turtle (initially 90 ◦ = north)<br />
pendown : is pen down? (initially true)<br />
8. Recursive descent parser for <strong>Turtle</strong> <strong>Graphics</strong><br />
FORWARD expr | RIGHT expr | LEFT expr<br />
PUSHSTATE | POPSTATE<br />
match(expected) {<br />
if (lookahead = expected)<br />
lookahead = nextToken(); /* fetch next token */<br />
else<br />
error();<br />
}<br />
program() {<br />
stmt_list();<br />
match(0);<br />
}<br />
stmt_list() {<br />
do {<br />
stmt();<br />
} while (lookahead in {IDENT, HOME, ..., POPSTATE});<br />
}<br />
stmt() {<br />
switch(lookahead) {<br />
case IDENT: assignment(); break;<br />
case HOME:<br />
...<br />
case POPSTATE: action(); break;<br />
default: error();<br />
}<br />
}<br />
assignment() {<br />
4
}<br />
symbol = symbolLookup(token value);<br />
match(IDENT);<br />
match(ASSIGN);<br />
num = expr();<br />
symbol->val = num;<br />
float expr () {<br />
n = term();<br />
while (true)<br />
switch(lookahead) {<br />
case ’+’ : match(’+’); n += term(); break;<br />
case ’-’ : match(’-’); n -= term(); break;<br />
default: return n;<br />
}<br />
}<br />
float term() {<br />
n = factor();<br />
while (true)<br />
switch(lookahead) {<br />
case ’*’ : match(’*’); n *= factor(); break;<br />
case ’/’ : match(’/’); n /= factor(); break;<br />
default: return n;<br />
}<br />
}<br />
float factor() {<br />
switch(lookahead) {<br />
case ’-’: match(’-’); return -factor();<br />
case ’+’: match(’+’); return factor();<br />
case ’(’: match(’(’); n = expr(); match(’)’); return n;<br />
case IDENT : return symbolLookup(value)->val;<br />
case REAL : return val;<br />
default: error();<br />
}<br />
}<br />
action() {<br />
switch(lookahead) {<br />
case HOME: match(HOME); home(); break; /* place turtle in home state */<br />
case PENUP: match(PENUP); turtle.pedown = false; break;<br />
case PENDOWN: match(PENDOWN); turtle.pendown = true; break;<br />
case FORWARD: match(FORWARD); moveForward(expr()); breal;<br />
case RIGHT: match(RIGHT); turn(-expr()); break;<br />
case LEFT: match(LEFT); turn(expr()); break;<br />
case PUSHSTATE: match(PUSHSTATE); pushState(); break;<br />
case POPSTATE: match(POPSTATE); popState(); break;<br />
default: error();<br />
}<br />
}<br />
home() {<br />
turtle.x = turtle.y = 0;<br />
turtle.dir = 90;<br />
}<br />
moveForward(dist) {<br />
turtle.x += dist*cos(turtle.dir);<br />
5
turtle.y += dist*sin(turtle.dir);<br />
}<br />
turn(angle) {<br />
turtle.dir += angle;<br />
while (turtle.dir >= 360) turtle.dir -= 360;<br />
while (turtle.dir < 0) turtle.dir += 360;<br />
}<br />
9. LR(k) parsing<br />
• Read input from left to right while constructing a rightmost derivation of the input stings using a<br />
lookahead of k symbols. The concept of LR parsing was introduced by Donald Knuth in 1965.<br />
• Bottom-up parsers<br />
– Transfer symbols from input to the stack until the uppermost stack symbols match the right side<br />
of a production. These symbols are replaced with the single variable from the left-hand side of the<br />
production.<br />
shift : transfer of token/value from input to stack;<br />
reduce : matching uppermost stack symbols with the RHS of a production and replacing them<br />
with the corresponding symbol on the LHS.<br />
– Strings of terminals and variables on the stack are constantly being replaced with variables “higher”<br />
in the grammar.<br />
– Ultimately the entire stack collapses to the grammar’s start symbol.<br />
• Parsing <strong>Turtle</strong> <strong>Graphics</strong> with YACC<br />
– YACC (acronym for “Yet Another Compiler Compiler”) is a tool for building LALR(1) parsers.<br />
– YACC processes an input file that contains:<br />
∗ a list of tokens (i.e. terminals) and variables (i.e. non-terminals) in grammar;<br />
∗ a grammar;<br />
∗ and C code that specifies what to do (as a side effect) as the language is being parsed.<br />
– The output of yacc is C source code for your parser, i.e. yacc creates a function yyparse() that<br />
does the parsing.<br />
– You provide a scanner function called yylex(). You can use a tool called LEX for this or “roll<br />
your own.”<br />
– Associated with each token and variable is a type. As tokens and variables are pushed onto the<br />
stack (e.g. during a shift or reduce operation), an associated value of the appropriate type is also<br />
placed on the stack. For each production in the grammar, we can specify how to combine old stack<br />
values to create a new value for the variable on the left-hand side of the grammar.<br />
– turtle.y<br />
%{<br />
#include <br />
/* other includes and prototypes go here */<br />
%}<br />
%union { /* type for value associated with each token and variable */<br />
float f;<br />
char *s;<br />
}<br />
%token ASSIGN<br />
%token IDENT /* IDENT associated with f field */<br />
%token REAL<br />
%token HOME PENUP PENDOWN FORWARD RIGHT LEFT PUSHSTATE POPSTATE<br />
%type expr /* expr associated with d field */<br />
%left ’+’ ’-’ /* precedence and associativity defined here */<br />
%left ’*’ ’/’<br />
6
%right UMINUS<br />
%%<br />
program : stmt_list {printState();}<br />
;<br />
stmt_list<br />
stmt<br />
: stmt_list stmt<br />
| stmt<br />
;<br />
: assignment<br />
| action<br />
;<br />
assignment : IDENT ASSIGN expr {symLookup($1)->val = $3;}<br />
;<br />
expr : ’-’ expr %prec UMINUS {$$ = -$2;}<br />
| ’+’ expr %prec UMINUS {$$ = $2;}<br />
| expr ’+’ expr {$$ = $1 + $3;}<br />
| expr ’-’ expr {$$ = $1 - $3;}<br />
| expr ’*’ expr {$$ = $1 * $3;}<br />
| expr ’/’ expr {$$ = $1 / $3;}<br />
| ’(’ expr ’)’ {$$ = $2;}<br />
| IDENT {$$ = symLookup($1)->val;}<br />
| REAL<br />
;<br />
action : HOME {home();}<br />
| PENUP {penup();}<br />
| PENDOWN {pendown();}<br />
| FORWARD expr {forward($2);}<br />
| RIGHT expr {left(-$2);}<br />
| LEFT expr {left($2);}<br />
| PUSHSTATE {pushstate();}<br />
| POPSTATE {popstate();}<br />
;<br />
%%<br />
int lineno;<br />
char *filename;<br />
void yyerror(char *msg) { /* called when syntax error encounted */<br />
fprintf(stderr, "%s [%d] : %s\n", filename, lineno, msg);<br />
exit(-1);<br />
}<br />
/* other helper C-code here */<br />
7