2006 Scheme and Functional Programming Papers, University of

More documents

Recommendations

Info

50 Scheme and Functional Programming, 2006
Automatic construction of parse trees for lexemes ∗ Danny Dubé Université Laval Quebec City, Canada Danny.Dube@ift.ulaval.ca Anass Kadiri ÉPITA Paris, France Anass.Kadiri@gmail.com Abstract Recently, Dubé and Feeley presented a technique that makes lexical analyzers able to build parse trees for the lexemes that match regular expressions. While parse trees usually demonstrate how a word is generated by a context-free grammar, these parse trees demonstrate how a word is generated by a regular expression. This paper describes the adaptation and the implementation of that technique in a concrete lexical analyzer generator for Scheme. The adaptation of the technique includes extending it to the rich set of operators handled by the generator and reversing the direction of the parse trees construction so that it corresponds to the natural right-toleft construction of the lists in Scheme. The implementation of the adapted technique includes modifications to both the generationtime and the analysis-time parts of the generator. Uses of the new addition and empirical measurements of its cost are presented. Extensions and alternatives to the technique are considered. Keywords Lexical analysis; Parse tree; Finite-state automaton; Lexical analyzer generator; Syntactic analysis; Compiler 1. Introduction In the field of compilation, more precisely in the domain of syntactic analysis, we are used to associate the notion of parse tree, or derivation tree, to the notion of context-free grammars. Indeed, a parse tree can be seen as a demonstration that a word is generated by a grammar. It also constitutes a convenient structured representation for the word. For example, in the context of a compiler, the word is usually a program and the parse tree (or a reshaped one) is often the internal representation of the program. Since, in many applications, the word is quite long and the structure imposed by the grammar is non-trivial, it is natural to insist on building parse trees. However, in the related field of lexical analysis, the notion of parse trees is virtually inexistent. Typically, the theoretical tools that tend to be used in lexical analysis are regular expressions and finite-state automata. Very often, the words that are manipulated are ∗ This work has been funded by the National Sciences and Engineering Research Council of Canada. Proceedings of the 2006 Scheme and Functional Programming Workshop University of Chicago Technical Report TR-2006-06 rather short and their structure, pretty simple. Consequently, the notion of parse trees is almost never associated to the notion of lexical analysis using regular expressions. However, we do not necessarily observe such simplicity in all applications. For instance, while numerical constants are generally considered to be simple lexical units, in a programming language such as Scheme [9], there are integers, rationals, reals, and complex constants, there are two notations for the complex numbers (rectangular and polar), there are different bases, and there are many kinds of prefixes and suffixes. While writing regular expressions for these numbers is manageable and matching sequences of characters with the regular expressions is straightforward, extracting and interpreting the interesting parts of a matched constant can be much more difficult and error-prone. This observation has lead Dubé and Feeley [4] to propose a technique to build parse trees for lexemes when they match regular expressions. Until now, this technique had remained paper work only as there was no implementation of it. In this work, we describe the integration of the technique into a genuine lexical analyzer generator, SILex [3], which is similar to the Lex tool [12, 13] except that it is intended for the Scheme programming language [9]. In this paper, we will often refer to the article by Dubé and Feeley and the technique it describes as the “original paper” and the “original technique”, respectively. Sections 2 and 3 presents summaries of the original technique and SILex, respectively. Section 4 continues with a few definitions. Section 5 presents how we adapted the original technique so that it could fit into SILex. This section is the core of the paper. Section 6 quickly describes the changes that we had to make to SILex to add the new facility. Section 7 gives a few concrete examples of interaction with the new implementation. The speed of parse tree construction is evaluated in Section 8. Section 9 is a brief discussion about related work. Section 10 mentions future work. 2. Summary of the construction of parse tree for lexemes Let us come back to the original technique. We just present a summary here since all relevant (and adapted) material is presented in details in the following sections. The original technique aims at making lexical analyzers able to build parse trees for the lexemes that they match. More precisely, the goal is to make the automatically generated lexical analyzers able to do so. There is not much point in using the technique on analyzers that are written by hand. Note that the parse trees are those for the lexemes, not those for the regular expressions that match the latter. (Parse trees for the regular expressions themselves can be obtained using conventional syntactic analysis [1].) Such a parse tree is a demonstration of how a regular expression generates a word, much in the same way as a (conventional) parse tree demonstrates how a context-free grammar generates a word. Figure 1 illustrates what the parse trees are for a word aab that is generated by both a 51
Page 1: Scheme and Functional Programming 2
Page 4 and 5: 4 Scheme and Functional Programming
Page 6 and 7: 6 Scheme and Functional Programming
Page 8 and 9: • A web browser that plays the ro
Page 10 and 11: and requests. When a client request
Page 12 and 13: above prevents pages from these dom
Page 14 and 15: 14 Scheme and Functional Programmin
Page 16 and 17: 2. Explaining Macros Macro expansio
Page 18 and 19: For completeness, here is the macro
Page 20 and 21: onment is extended in the original
Page 22 and 23: expand-term(term, env, phase) = emi
Page 24 and 25: Derivation ::= (make-mrule Syntax S
Page 26 and 27: True derivation (before macro hidin
Page 28 and 29: We assume that the reader has basic
Page 30 and 31: the value of the %eax register by 4
Page 32 and 33: Code generation for the new forms i
Page 34 and 35: we introduced in 3.11. The only dif
Page 36 and 37: the user code from interfering with
Page 40 and 41: mization and on creating efficient
Page 42 and 43: (a) stage (b) fifo (c) split (d) me
Page 44 and 45: This tagging is used later in the c
Page 46 and 47: (let ((clo_25 (%closure (lambda (y)
Page 48 and 49: nb. cycles per element 1400 1200 10
Page 52 and 53: a ∗ ❅ left right · b ❅ a S
Page 54 and 55: T ([spec], w) = { {w}, if w ∈ L([
Page 56 and 57: A([spec]): ✓✏ (where L([spec])
Page 58 and 59: construction commands. It is possib
Page 60 and 61: ; Regular expression for Scheme num
Page 62 and 63: References [1] A. V. Aho, R. Sethi,
Page 64 and 65: (let ((n (cond ((char? var0) ) ((sy
Page 66 and 67: 3. Survey An incomplete survey of a
Page 68 and 69: case monster 10 literals 100 litera
Page 72 and 73: modest programming requirements, an
Page 74 and 75: development of incomplete subsystem
Page 76 and 77: the previous request. This method a
Page 78 and 79: could be run indefinitely. This fun
Page 80 and 81: programmers reject Scheme without r
Page 82 and 83: (define interp (λ (env e) (case e
Page 84 and 85: Before describing the run-time sema
Page 86 and 87: Figure 5. Cast Insertion Figure 6.
Page 88 and 89: Figure 7. Evaluation Figure 8. Eval
Page 90 and 91: catching type errors, as we do here
Page 92 and 93: [34] J. C. Reynolds. Types, abstrac
Page 94 and 95: let id (T:*) (x:T) : T = x; The tra
Page 96 and 97: Figure 3: Regular Expressions and N
Page 98 and 99: Figure 5: Evaluation Rules Evaluati
Page 100 and 101:
5. Exact Substitution: E, (x = v :
Page 102 and 103:
Figure 9: Subtyping Algorithm Algor
Page 104 and 105:
for most type variables, and that m
Page 106 and 107:
2. miniKanren Overview This section
Page 108 and 109:
3. Pseudo-Variadic Relations Just a
Page 110 and 111:
Replacing run 10 with run ∗ cause
Page 112 and 113:
As might be expected, we could use
Page 114 and 115:
Of course there are still infinitel
Page 116 and 117:
To ensure that streams produced by
Page 118 and 119:
118 Scheme and Functional Programmi
Page 120 and 121:
On the other hand, we can use a hig
Page 122 and 123:
(ev* (Q (lambda (x) (+ x 1)))) # >
Page 124 and 125:
(term ’lam (lambda (x) (if (equal
Page 126 and 127:
a message: there is no guarantee th
Page 128 and 129:
(! (self) 3) (?) =⇒ 1 (?? odd?) =
Page 130 and 131:
(define new-server (spawn (lambda (
Page 132 and 133:
The abstraction shown in this secti
Page 134 and 135:
Erlang Termite List length (µs) (
Page 136 and 137:
Page 138 and 139:
page of j contains the URL of the p
Page 140 and 141:
4. Solution The failed attempts abo
Page 142 and 143:
· ; · ; · :: Store × Frame Stac
Page 144 and 145:
logically creates a sub-session of
Page 146 and 147:
on this work, including Ryan Culpep
Page 148 and 149:
ing application operation. 1 As a r
Page 150 and 151:
such as image glyphs corresponding
Page 152 and 153:
Phone For clarity, this code pr
Page 154 and 155:
A. Porting TinyScheme to Qualcomm B
Page 156 and 157:
Page 158 and 159:
ware installers have to install any
Page 160 and 161:
defines a module named circle-lib i
Page 162 and 163:
Figure 2. Sometimes special cases a
Page 164 and 165:
Considering all of these issues tog
show all

2006 Scheme and Functional Programming Papers, University of

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?