Modern compiler design [PDF]
Modern compiler design [PDF]
Modern compiler design [PDF]
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Once a name is defined, it can be used in a regular expression for a lex rule by enclosing it in braces { and<br />
}. For instance, if the line<br />
DIGIT [0-9]<br />
appeared in the Lex Definitions of a .lex file, then {DIGIT} would be a symbolic name for the regular<br />
expression [0-9]. Anywhere [0-9] appears in a regular expression, we could use {DIGIT} instead. Thus the<br />
rule:<br />
{DIGIT}+ { return INTEGER_LITERAL; }<br />
would be equivalent to the rule:<br />
[0-9]+ { return INTEGER_LITERAL; }<br />
An example of using named regular expressions is in Figure 2.2.<br />
2.1.3 Tokens With Values<br />
For tokens that don’t have values – ELSE, FOR, SEMICOLON, etc – we only care about which rule was<br />
matched. No other information is needed. Some tokens, such as INTEGER LITERAL and IDENTIFIER,<br />
have values. 57, for instance, is an INTEGER LITERAL token that has the value 57. foobar is an IDEN-<br />
TIFIER token that has the value “foobar”. How can we have yylex return both which token was matched,<br />
and the value of the token? We can use a global variable to communicate the extra information. yylex() can<br />
set the value of the global variable before returning the token type. The function that calls yylex() can then<br />
examine the value of this global variable to determine the value of the token. Let’s take a look at a slightly<br />
more complicated lex file, simple2.lex (Figure 2.2). In the C declarations portion of the file, we have defined<br />
a global variable yylval 1 .<br />
union {<br />
int integer_value;<br />
char *string_value;<br />
} yylval;<br />
When we wish to return the value of a token, we will set the value of this global variable. Consider the<br />
rule for INTEGER LITERAL:<br />
[0-9]+ { yylval.integer_value = atoi(yytext);<br />
return INTEGER_LITERAL; }<br />
When we match an integer in the input file, we first set the integer value field of the global variable yylval.<br />
To set the integer value field to the correct value, we need to know the actual string that was matched to the<br />
regular expression [0-9]+. The variable yytext is automatically set to the value of the last string matched<br />
by lex. Thus, all we need to do is convert that string to an integer value through the ascii-to-integer function<br />
atoi defined in , and set the integer value field of the variable yylval. We can then return an<br />
INTEGER LITERAL token.<br />
The other token in this example that has a value is the IDENTIFIER token. The rule for an IDENTIFER<br />
is:<br />
[a-zA-Z][a-zA-Z0-9]* { yylval.string_value = malloc(sizeof(char) *<br />
(strlen(yytext)+1));<br />
strcpy(yylval.string_value,yytext);<br />
return IDENTIFIER; }<br />
1 Why do we call this variable yylval instead of a more meaningful name like tokenValue? For the final project, we will use<br />
lex in conjunction with another tool, yacc. Yacc has a rather odd naming convention, and we have to follow it for the two tools<br />
to work together.<br />
7