Introduction to Lex - Faculty of Computer Science
Introduction to Lex - Faculty of Computer Science
Introduction to Lex - Faculty of Computer Science
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
<strong>Introduction</strong> <strong>to</strong> <strong>Lex</strong><br />
• General Description<br />
• Input file<br />
• Output file<br />
• How matching is done<br />
• Regular expressions<br />
• Local names<br />
• Using <strong>Lex</strong><br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong>
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
General Description<br />
• <strong>Lex</strong> is a program that au<strong>to</strong>matically<br />
generates code for scanners.<br />
• Input: a description <strong>of</strong> the <strong>to</strong>kens in the<br />
form <strong>of</strong> regular expressions, <strong>to</strong>gether<br />
with the actions <strong>to</strong> be taken when each<br />
expression is matched.<br />
• Output: a text file with C source code<br />
defining a procedure yylex() that is a table<br />
implementing the DFA for the regular<br />
expressions.<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong>
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Input File<br />
• <strong>Lex</strong> input file is divided in three<br />
parts<br />
/* declarations*/<br />
…<br />
%%<br />
/* rules */<br />
…<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
%%<br />
/* auxiliary functions*/<br />
…
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
Input File – Declaration<br />
• The declaration part includes the<br />
assignment <strong>of</strong> names <strong>to</strong> regular<br />
expressions in the form:<br />
<br />
• It can also include C code external <strong>to</strong> the<br />
definition <strong>of</strong> yylex() within %{ and %} in<br />
the first column.<br />
• Also, it is possible <strong>to</strong> specify some<br />
options with the sintax %option
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Input File - Rules<br />
• The rules part specifies what <strong>to</strong> do<br />
when a regular expression is<br />
matched<br />
<br />
• Actions are normal C sentences<br />
(can be a complex C sentence<br />
between {}).<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
• %{ %} enclosed text appearing<br />
before the first rule may be used <strong>to</strong><br />
declare local variables.
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
Input File – Aux Functions<br />
• The auxiliary functions part is only<br />
C code.<br />
• It includes function definitions for<br />
every function needed in the rule<br />
part<br />
• It can also contain the main()<br />
function if the scanner is going <strong>to</strong><br />
be used as a standalone program.<br />
• The main() function must call the<br />
function yylex()
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
Code Generated by <strong>Lex</strong><br />
• The output <strong>of</strong> <strong>Lex</strong> is a file called<br />
lex.yy.c<br />
• It is a C function that returns an<br />
integer, i.e., a code for a Token<br />
• If it contains the main() function<br />
definition, it must be compiled <strong>to</strong><br />
run.<br />
• Otherwise, the code can be an<br />
external function declaration for the<br />
function int yylex()
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
How matching is done<br />
• By running the generated scanner, it<br />
analyses its input looking for strings that<br />
match any <strong>of</strong> its patterns, and then<br />
executes the action.<br />
• If more than one match is found, it selects<br />
the regular expression matching the<br />
longest string.<br />
• If it finds two or more matches <strong>of</strong> the same<br />
length, the one listed first is selected.<br />
• If no match is found, then the default rule<br />
is executed: i.e., the next character in the<br />
input is copied <strong>to</strong> the output.
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
Regular expressions<br />
• ::= a the character a<br />
• ::= sthe string s, even if s contains metacharacters<br />
• ::= \a the character a when a is a metacharacter (e.g., *)<br />
• ::= . any character except newline<br />
• ::= [+] any <strong>of</strong> the character <br />
• ::= [-] any character from <strong>to</strong><br />
<br />
• ::= [^+] any character except those <br />
• ::= * zero or more repetitions <strong>of</strong> <br />
• ::= + one or more repetitions <strong>of</strong> <br />
• ::= ? zero or one repetitions <strong>of</strong> <br />
• ::= | or <br />
• ::= followed by<br />
<br />
• ::= () same as <br />
• ::= {} the named regular expression in the<br />
definitions part
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
Internal names<br />
• The rules, inside the action<br />
definition, can refer <strong>to</strong> the following<br />
variables:<br />
– yytext, the string being matched<br />
(lexeme)<br />
– yyin, the input file<br />
– yyout, the output file<br />
– ECHO, the default rule action<br />
– yyval, the global variable for<br />
communicating the Attribute for a<br />
Token <strong>to</strong> the Parser
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Using <strong>Lex</strong><br />
• There are several lex versions.<br />
We're going <strong>to</strong> use flex.<br />
• In order <strong>to</strong> maximize compatibility,<br />
use -l option when compiling, and<br />
%option noyywrap in the definition<br />
part <strong>of</strong> the <strong>Lex</strong> input file.<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong>
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Example 1<br />
%{<br />
#include <br />
%}<br />
%%<br />
[0-9]+ { printf("%s\n", yytext); }<br />
.|\n ;<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
%%<br />
main()<br />
{<br />
yylex();<br />
}
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Example 2<br />
%{<br />
int c=0, w=0, l=0;<br />
%}<br />
word [^ \t\n]+<br />
eol \n<br />
%%<br />
{word} {w++; c+=yyleng;};<br />
{eol} {c++; l++;}<br />
. {c++;}<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
%%<br />
main()<br />
{<br />
}<br />
yylex();<br />
printf("%d %d %d\n", l, w, c);
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Example 3<br />
%{<br />
int <strong>to</strong>kenCount=0;<br />
%}<br />
%%<br />
[a-zA-Z]+ { printf("%d WORD \"%s\"\n",<br />
++<strong>to</strong>kenCount, yytext); }<br />
[0-9]+ { printf("%d NUMBER \"%s\"\n",<br />
++<strong>to</strong>kenCount, yytext); }<br />
[^a-zA-Z0-9]+ { printf("%d OTHER \"%s\"\n",<br />
++<strong>to</strong>kenCount, yytext); }<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
%%<br />
main() { yylex(); }
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Example 4<br />
%{<br />
#include <br />
int lineno =1;<br />
%}<br />
line .*\n<br />
%%<br />
{line} {printf("%5d %s", lineno++, yytext); }<br />
%%<br />
int main() {<br />
yylex(); return 0; }<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong>
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Example 5<br />
%{<br />
#include <br />
%}<br />
comment_line \/\/.*\n<br />
%%<br />
{comment_line} { printf("%s\n", yytext); }<br />
.*\n ;<br />
%%<br />
int main() {<br />
yylex(); return 0;<br />
}<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong>
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Example 7<br />
%{<br />
#include <br />
%}<br />
digit [0-9]<br />
number {digit}+<br />
%%<br />
{number}<br />
. {;}<br />
%%<br />
{ int n = a<strong>to</strong>i(yytext);<br />
printf("%x", n); }<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
int main() {<br />
yylex(); return 0;<br />
}
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Example 8 (1/4)<br />
%{<br />
#include "globals.h"<br />
#include "util.h"<br />
#include "scan.h"<br />
int lineno=0;<br />
FILE *listing;<br />
FILE *code;<br />
FILE *source;<br />
code<br />
int TraceScan = 1;<br />
int EchoSource = 1;<br />
// used <strong>to</strong> output source code listing<br />
// used <strong>to</strong> output assembly code<br />
// used <strong>to</strong> input tiny program source<br />
/* lexeme <strong>of</strong> identifier or reserved word */<br />
char <strong>to</strong>kenString[MAXTOKENLEN+1];<br />
%}<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
digit [0-9]<br />
number {digit}+<br />
letter [a-zA-Z]<br />
identifier {letter}+<br />
newline \n<br />
whitespace [ \t]+
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
Example 8 (2/4)<br />
%%<br />
"if" {return IF;}<br />
"then" {return THEN;}<br />
"else" {return ELSE;}<br />
"end" {return END;}<br />
"repeat" {return REPEAT;}<br />
"until" {return UNTIL;}<br />
"read" {return READ;}<br />
"write" {return WRITE;}<br />
":=" {return ASSIGN;}<br />
"=" {return EQ;}<br />
"
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Example 8 (3/4)<br />
{number} {return NUM;}<br />
{identifier} {return ID;}<br />
{newline} {lineno++;}<br />
{whitespace} {/* skip whitespace */}<br />
"{" { char c;<br />
do<br />
{ c = input();<br />
if (c == '\n') lineno++;<br />
} while (c != '}');<br />
}<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong>
Principles <strong>of</strong> Compilers - 2/03/2006<br />
Pablo R. Fillottrani<br />
Example 8 (4/4)<br />
%%<br />
TokenType getToken(void)<br />
{ static int firstTime = TRUE;<br />
TokenType currentToken;<br />
if (firstTime)<br />
{ firstTime = FALSE;<br />
lineno++;<br />
yyin = source=stdin;<br />
yyout = listing=stdout; }<br />
currentToken = (TokenType)yylex();<br />
strncpy(<strong>to</strong>kenString,yytext,MAXTOKENLEN);<br />
if (TraceScan) {<br />
fprintf(listing,"\t%d: ",lineno);<br />
printToken(currentToken,<strong>to</strong>kenString);<br />
}<br />
return currentToken; }<br />
FREE<br />
UNIVERSITY OF<br />
BOZEN–BOLZANO<br />
<strong>Faculty</strong> <strong>of</strong><br />
<strong>Computer</strong> <strong>Science</strong><br />
int main(){<br />
TraceScan = TRUE;<br />
while( getToken() != ENDFILE);<br />
return 0; }