23.12.2013 Views

Introduction to Lex - Faculty of Computer Science

Introduction to Lex - Faculty of Computer Science

Introduction to Lex - Faculty of Computer Science

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

<strong>Introduction</strong> <strong>to</strong> <strong>Lex</strong><br />

• General Description<br />

• Input file<br />

• Output file<br />

• How matching is done<br />

• Regular expressions<br />

• Local names<br />

• Using <strong>Lex</strong><br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong>


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

General Description<br />

• <strong>Lex</strong> is a program that au<strong>to</strong>matically<br />

generates code for scanners.<br />

• Input: a description <strong>of</strong> the <strong>to</strong>kens in the<br />

form <strong>of</strong> regular expressions, <strong>to</strong>gether<br />

with the actions <strong>to</strong> be taken when each<br />

expression is matched.<br />

• Output: a text file with C source code<br />

defining a procedure yylex() that is a table<br />

implementing the DFA for the regular<br />

expressions.<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong>


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Input File<br />

• <strong>Lex</strong> input file is divided in three<br />

parts<br />

/* declarations*/<br />

…<br />

%%<br />

/* rules */<br />

…<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

%%<br />

/* auxiliary functions*/<br />


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

Input File – Declaration<br />

• The declaration part includes the<br />

assignment <strong>of</strong> names <strong>to</strong> regular<br />

expressions in the form:<br />

<br />

• It can also include C code external <strong>to</strong> the<br />

definition <strong>of</strong> yylex() within %{ and %} in<br />

the first column.<br />

• Also, it is possible <strong>to</strong> specify some<br />

options with the sintax %option


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Input File - Rules<br />

• The rules part specifies what <strong>to</strong> do<br />

when a regular expression is<br />

matched<br />

<br />

• Actions are normal C sentences<br />

(can be a complex C sentence<br />

between {}).<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

• %{ %} enclosed text appearing<br />

before the first rule may be used <strong>to</strong><br />

declare local variables.


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

Input File – Aux Functions<br />

• The auxiliary functions part is only<br />

C code.<br />

• It includes function definitions for<br />

every function needed in the rule<br />

part<br />

• It can also contain the main()<br />

function if the scanner is going <strong>to</strong><br />

be used as a standalone program.<br />

• The main() function must call the<br />

function yylex()


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

Code Generated by <strong>Lex</strong><br />

• The output <strong>of</strong> <strong>Lex</strong> is a file called<br />

lex.yy.c<br />

• It is a C function that returns an<br />

integer, i.e., a code for a Token<br />

• If it contains the main() function<br />

definition, it must be compiled <strong>to</strong><br />

run.<br />

• Otherwise, the code can be an<br />

external function declaration for the<br />

function int yylex()


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

How matching is done<br />

• By running the generated scanner, it<br />

analyses its input looking for strings that<br />

match any <strong>of</strong> its patterns, and then<br />

executes the action.<br />

• If more than one match is found, it selects<br />

the regular expression matching the<br />

longest string.<br />

• If it finds two or more matches <strong>of</strong> the same<br />

length, the one listed first is selected.<br />

• If no match is found, then the default rule<br />

is executed: i.e., the next character in the<br />

input is copied <strong>to</strong> the output.


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

Regular expressions<br />

• ::= a the character a<br />

• ::= sthe string s, even if s contains metacharacters<br />

• ::= \a the character a when a is a metacharacter (e.g., *)<br />

• ::= . any character except newline<br />

• ::= [+] any <strong>of</strong> the character <br />

• ::= [-] any character from <strong>to</strong><br />

<br />

• ::= [^+] any character except those <br />

• ::= * zero or more repetitions <strong>of</strong> <br />

• ::= + one or more repetitions <strong>of</strong> <br />

• ::= ? zero or one repetitions <strong>of</strong> <br />

• ::= | or <br />

• ::= followed by<br />

<br />

• ::= () same as <br />

• ::= {} the named regular expression in the<br />

definitions part


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

Internal names<br />

• The rules, inside the action<br />

definition, can refer <strong>to</strong> the following<br />

variables:<br />

– yytext, the string being matched<br />

(lexeme)<br />

– yyin, the input file<br />

– yyout, the output file<br />

– ECHO, the default rule action<br />

– yyval, the global variable for<br />

communicating the Attribute for a<br />

Token <strong>to</strong> the Parser


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Using <strong>Lex</strong><br />

• There are several lex versions.<br />

We're going <strong>to</strong> use flex.<br />

• In order <strong>to</strong> maximize compatibility,<br />

use -l option when compiling, and<br />

%option noyywrap in the definition<br />

part <strong>of</strong> the <strong>Lex</strong> input file.<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong>


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Example 1<br />

%{<br />

#include <br />

%}<br />

%%<br />

[0-9]+ { printf("%s\n", yytext); }<br />

.|\n ;<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

%%<br />

main()<br />

{<br />

yylex();<br />

}


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Example 2<br />

%{<br />

int c=0, w=0, l=0;<br />

%}<br />

word [^ \t\n]+<br />

eol \n<br />

%%<br />

{word} {w++; c+=yyleng;};<br />

{eol} {c++; l++;}<br />

. {c++;}<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

%%<br />

main()<br />

{<br />

}<br />

yylex();<br />

printf("%d %d %d\n", l, w, c);


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Example 3<br />

%{<br />

int <strong>to</strong>kenCount=0;<br />

%}<br />

%%<br />

[a-zA-Z]+ { printf("%d WORD \"%s\"\n",<br />

++<strong>to</strong>kenCount, yytext); }<br />

[0-9]+ { printf("%d NUMBER \"%s\"\n",<br />

++<strong>to</strong>kenCount, yytext); }<br />

[^a-zA-Z0-9]+ { printf("%d OTHER \"%s\"\n",<br />

++<strong>to</strong>kenCount, yytext); }<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

%%<br />

main() { yylex(); }


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Example 4<br />

%{<br />

#include <br />

int lineno =1;<br />

%}<br />

line .*\n<br />

%%<br />

{line} {printf("%5d %s", lineno++, yytext); }<br />

%%<br />

int main() {<br />

yylex(); return 0; }<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong>


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Example 5<br />

%{<br />

#include <br />

%}<br />

comment_line \/\/.*\n<br />

%%<br />

{comment_line} { printf("%s\n", yytext); }<br />

.*\n ;<br />

%%<br />

int main() {<br />

yylex(); return 0;<br />

}<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong>


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Example 7<br />

%{<br />

#include <br />

%}<br />

digit [0-9]<br />

number {digit}+<br />

%%<br />

{number}<br />

. {;}<br />

%%<br />

{ int n = a<strong>to</strong>i(yytext);<br />

printf("%x", n); }<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

int main() {<br />

yylex(); return 0;<br />

}


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Example 8 (1/4)<br />

%{<br />

#include "globals.h"<br />

#include "util.h"<br />

#include "scan.h"<br />

int lineno=0;<br />

FILE *listing;<br />

FILE *code;<br />

FILE *source;<br />

code<br />

int TraceScan = 1;<br />

int EchoSource = 1;<br />

// used <strong>to</strong> output source code listing<br />

// used <strong>to</strong> output assembly code<br />

// used <strong>to</strong> input tiny program source<br />

/* lexeme <strong>of</strong> identifier or reserved word */<br />

char <strong>to</strong>kenString[MAXTOKENLEN+1];<br />

%}<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

digit [0-9]<br />

number {digit}+<br />

letter [a-zA-Z]<br />

identifier {letter}+<br />

newline \n<br />

whitespace [ \t]+


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

Example 8 (2/4)<br />

%%<br />

"if" {return IF;}<br />

"then" {return THEN;}<br />

"else" {return ELSE;}<br />

"end" {return END;}<br />

"repeat" {return REPEAT;}<br />

"until" {return UNTIL;}<br />

"read" {return READ;}<br />

"write" {return WRITE;}<br />

":=" {return ASSIGN;}<br />

"=" {return EQ;}<br />

"


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Example 8 (3/4)<br />

{number} {return NUM;}<br />

{identifier} {return ID;}<br />

{newline} {lineno++;}<br />

{whitespace} {/* skip whitespace */}<br />

"{" { char c;<br />

do<br />

{ c = input();<br />

if (c == '\n') lineno++;<br />

} while (c != '}');<br />

}<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong>


Principles <strong>of</strong> Compilers - 2/03/2006<br />

Pablo R. Fillottrani<br />

Example 8 (4/4)<br />

%%<br />

TokenType getToken(void)<br />

{ static int firstTime = TRUE;<br />

TokenType currentToken;<br />

if (firstTime)<br />

{ firstTime = FALSE;<br />

lineno++;<br />

yyin = source=stdin;<br />

yyout = listing=stdout; }<br />

currentToken = (TokenType)yylex();<br />

strncpy(<strong>to</strong>kenString,yytext,MAXTOKENLEN);<br />

if (TraceScan) {<br />

fprintf(listing,"\t%d: ",lineno);<br />

printToken(currentToken,<strong>to</strong>kenString);<br />

}<br />

return currentToken; }<br />

FREE<br />

UNIVERSITY OF<br />

BOZEN–BOLZANO<br />

<strong>Faculty</strong> <strong>of</strong><br />

<strong>Computer</strong> <strong>Science</strong><br />

int main(){<br />

TraceScan = TRUE;<br />

while( getToken() != ENDFILE);<br />

return 0; }

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!