2006 Scheme and Functional Programming Papers, University of

More documents

Recommendations

Info

; Regular expression for Scheme numbers ; (base 10 only, without ’#’ digits) digit [0-9] digit10 {digit} radix10 ""|#[dD] exactness ""|#[iI]|#[eE] sign ""|"+"|"-" exponent_marker [eEsSfFdDlL] suffix ""|{exponent_marker}{sign}{digit10}+ prefix10 {radix10}{exactness}|{exactness}{radix10} uinteger10 {digit10}+ decimal10 ({uinteger10}|"."{digit10}+|{digit10}+"."{digit10}*){suffix} ureal10 {uinteger10}|{uinteger10}/{uinteger10}|{decimal10} real10 {sign}{ureal10} complex10 {real10}|{real10}@{real10}|{real10}?[-+]{ureal10}?[iI] num10 {prefix10}{complex10} number {num10} %% {number} (lex-number yyast) Figure 6. SILex specification for the essentials of the lexical structure of Scheme numbers. ; Companion code for Scheme numbers (define lex-number (lambda (t) (let* ((digit (lambda (t) (- (char->integer t) (char->integer #\0)))) (digit10 (lambda (t) (digit t))) (exactness (lambda (t) (case (car t) ((0) (lambda (x) x)) ((1) (lambda (x) (* 1.0 x))) (else (lambda (x) (if (exact? x) x (inexact->exact x))))))) (sign (lambda (t) (if (= (car t) 2) -1 1))) (digit10+ (lambda (t) (let loop ((n 0) (t t)) (if (null? t) n (loop (+ (* 10 n) (digit10 (car t))) (cdr t)))))) (suffix (lambda (t) (if (= (car t) 0) 0 (let ((tt (cadr t))) (* 1.0 (sign (list-ref tt 1)) (digit10+ (list-ref tt 2))))))) (prefix10 (lambda (t) (exactness (list-ref (cadr t) (- 1 (car t)))))) (uinteger10 (lambda (t) (digit10+ t))) (decimal10 (lambda (t) (let* ((e2 (suffix (list-ref t 1))) (tt (list-ref t 0)) (ttt (cadr tt))) (case (car tt) ((0) (* (digit10+ ttt) (expt 10 e2))) ((1) (let ((tttt (list-ref ttt 1))) (* (digit10+ tttt) (expt 10.0 (- e2 (length tttt)))))) (else (let* ((tttt1 (list-ref ttt 0)) (tttt2 (list-ref ttt 2))) (* (digit10+ (append tttt1 tttt2)) (expt 10.0 (- e2 (length tttt2)))))))))) (ureal10 (lambda (t) (let ((tt (cadr t))) (case (car t) ((0) (uinteger10 tt)) ((1) (/ (uinteger10 (list-ref tt 0)) (uinteger10 (list-ref tt 2)))) (else (decimal10 tt)))))) (real10 (lambda (t) (* (sign (list-ref t 0)) (ureal10 (list-ref t 1))))) (opt (lambda (op t default) (if (null? t) default (op (list-ref t 0))))) (complex10 (lambda (t) (let ((tt (cadr t))) (case (car t) ((0) (real10 tt)) ((1) (make-polar (real10 (list-ref tt 0)) (real10 (list-ref tt 2)))) (else (make-rectangular (opt real10 (list-ref tt 0) 0) (* (if (char=? (list-ref tt 1) #\+) 1 -1) (opt ureal10 (list-ref tt 2) 1)))))))) (num10 (lambda (t) ((prefix10 (list-ref t 0)) (complex10 (list-ref t 1))))) (number (lambda (t) (num10 t)))) (number t)))) Figure 7. Implementation of a helper function for the lexical analysis of numbers. 60 Scheme and Functional Programming, 2006
one cannot get rid of, even when parse tree construction is almost never used. 9. Discussion As far as we know, the original technique is the only one that makes automatically generated lexical analyzers able to build parse trees for lexemes using only finite-state tools and this work is the only implementation of it. Generated lexical analyzers always give access to the matched lexemes. It is essential for the production of tokens in lexical analysis. To also have access to information that is automatically extracted from the lexemes is a useful feature. However, when such a feature is provided, it is typically limited to the ability to extract sub-lexemes that correspond to tagged (e.g. using \( and \)) sub-expressions of the regular expression that matches the lexeme. Techniquely, for efficiency reasons, it is the position and the length of the sub-lexemes that get extracted. The IEEE standard 1003.1 describes, among other things, which sub-lexemes must be extracted [7]. Ville Laurikari presents an efficient technique to extract sub-lexemes in a way that complies with the standard [11]. In our opinion, extraction by tagging is too restrictive. The main problem is that, when a tagged sub-expression lies inside of a repetition operators (or inside of what is sometimes called a non-linear context) and this sub-expression matches many different parts of a given lexeme, only one of the sub-lexemes is reported. So extraction by tagging starts to become ineffective exactly in the situations where the difficulty or the sophistication of the extraction would make automated extraction most interesting. Since the conventional way of producing parse trees consists in using a syntactic analyzer based on context-free grammar technology, one might consider using just that to build parse trees for his lexemes. For instance, one could identify lexemes using a DFA and then submit the lexemes to a subordinate syntactic analyzer to build parse trees. Alternatively, one could abandon finite-state technology completely and directly use a scanner-less syntactic analyzer. However, both options suffer from the fact that analyzers based on context-free grammars are much slower than those based on finite-state automata. Moreover, an ambiguous regular expression would be translated into an ambiguous context-free grammar. Our technique handles ambiguous expressions without problem but most parsing technology cannot handle ambiguous grammars. Of course, there exist parsing techniques that can handle ambiguous grammars, such as Generalized LR Parsing [10, 15], the Earley algorithm [5], or the CYK algorithm [8, 17, 2], but these exhibit worse than linear time complexity for most or all ambiguous grammars. Finally, it is possible to translate any regular expression into an unambiguous left- or right-linear grammar [6]. However, the resulting grammar would be completely distorted and would lead to parse trees that have no connection to the parse trees for lexemes that we introduced here. 10. Future work • We intend to complete the integration of automatic parse tree construction into SILex and to clean up the whole implementation. • Parse tree construction could be made faster. In particular, when the DFA is represented as Scheme code, the functions that build the trees ought to be specialized code generated from the information contained in the three tables. • The penalty that is strictly due to the additional instrumentation (i.e. when no parse trees are requested) ought to be reduced. A way to improve the situation consists in marking the deterministic states that may reach an accepting state that corresponds to a rule that requests the construction of a parse tree. Then, for states that are not marked, the instrumentation that records the path in the DFA could be omitted. • All tables generated by SILex ought to be compacted but the one for f, in particular, really needs it. Recall that f takes a three-dimensional input and returns a variable-length output (a pair that contains a sequence of commands). • Some or all of the following regular operators could be added to SILex: the difference (denoted by, say, r 1−r 2), the complement (r), and the intersection (r 1&r 2). Note that, in the case of the complement operator, there would be no meaningful notion of a parse tree for a lexeme that matches r. In the case of the difference r 1 − r 2, the parse trees for a matching lexeme w would be the demonstration that r 1 generates w. Finally, in the case of the intersection, for efficiency reasons, only one of the sub-expressions should be chosen to be the one that dictates the shape of the parse trees. • The parse trees act as (too) detailed demonstrations. Almost always, they will be either transformed into a more convenient structure, possibly with unnecessary details dropped, or completely consumed to become non-structural information. In other words, they typically are transient data. Consequently, it means that only their informational contents were important and that they have been built as concrete data structures to no purpose. In such a situation, deforestation techniques [16] could be used so that the consumer of a parse tree could virtually traverse it even as it is virtually built, making the actual construction unnecessary. 11. Conclusion This paper presented the adaptation and the implementation of the automated construction of parse tree for lexemes. The technique that has been adapted was originally presented in 2000 by Dubé and Feeley. It has been implemented and integrated in SILex, a lexical analyzer generator for Scheme. The adaptation was a simple step as it consisted only in modifying the automaton construction rules of the original technique so that the larger set of regular operators of SILex was handled and so that the way the parse trees are built match the right-to-left direction in which lists are built in Scheme. The implementation was a much more complicated task. Fortunately, SILex, like the original technique, is based on the construction of non-deterministic automata that get converted into deterministic ones. Still, most parts of the generator had to be modified more or less deeply and some extensions also had to be made to the analysis-time module of the tool. Modifications have been done to the representation of the regular expressions, to the way the nondeterministic automata are built and represented, to the conversion of the automata into deterministic ones, to the printing of SILex’s tables, to the generation of Scheme code that forms parts of the lexical analyzers, to the algorithm that recognize lexemes, and to the (previously inexistent) construction of the parse trees. The new version of SILex does work, experiments could be run, but the implementation is still somehow disorganized. The construction of parse trees is a pretty costly operation compared to the normal functioning of a deterministic automaton-based lexical analyzer and, indeed, empirical measurements show that its intensive use roughly triples the execution time of an analyzer. Acknowledgments We wish to thank the anonymous referees whose comments really helped to improve this paper. Scheme and Functional Programming, 2006 61
Page 1:
Scheme and Functional Programming 2
Page 4 and 5:
4 Scheme and Functional Programming
Page 6 and 7:
6 Scheme and Functional Programming
Page 8 and 9:
• A web browser that plays the ro
Page 10 and 11: and requests. When a client request
Page 12 and 13: above prevents pages from these dom
Page 14 and 15: 14 Scheme and Functional Programmin
Page 16 and 17: 2. Explaining Macros Macro expansio
Page 18 and 19: For completeness, here is the macro
Page 20 and 21: onment is extended in the original
Page 22 and 23: expand-term(term, env, phase) = emi
Page 24 and 25: Derivation ::= (make-mrule Syntax S
Page 26 and 27: True derivation (before macro hidin
Page 28 and 29: We assume that the reader has basic
Page 30 and 31: the value of the %eax register by 4
Page 32 and 33: Code generation for the new forms i
Page 34 and 35: we introduced in 3.11. The only dif
Page 36 and 37: the user code from interfering with
Page 40 and 41: mization and on creating efficient
Page 42 and 43: (a) stage (b) fifo (c) split (d) me
Page 44 and 45: This tagging is used later in the c
Page 46 and 47: (let ((clo_25 (%closure (lambda (y)
Page 48 and 49: nb. cycles per element 1400 1200 10
Page 52 and 53: a ∗ ❅ left right · b ❅ a S
Page 54 and 55: T ([spec], w) = { {w}, if w ∈ L([
Page 56 and 57: A([spec]): ✓✏ (where L([spec])
Page 58 and 59: construction commands. It is possib
Page 62 and 63: References [1] A. V. Aho, R. Sethi,
Page 64 and 65: (let ((n (cond ((char? var0) ) ((sy
Page 66 and 67: 3. Survey An incomplete survey of a
Page 68 and 69: case monster 10 literals 100 litera
Page 72 and 73: modest programming requirements, an
Page 74 and 75: development of incomplete subsystem
Page 76 and 77: the previous request. This method a
Page 78 and 79: could be run indefinitely. This fun
Page 80 and 81: programmers reject Scheme without r
Page 82 and 83: (define interp (λ (env e) (case e
Page 84 and 85: Before describing the run-time sema
Page 86 and 87: Figure 5. Cast Insertion Figure 6.
Page 88 and 89: Figure 7. Evaluation Figure 8. Eval
Page 90 and 91: catching type errors, as we do here
Page 92 and 93: [34] J. C. Reynolds. Types, abstrac
Page 94 and 95: let id (T:*) (x:T) : T = x; The tra
Page 96 and 97: Figure 3: Regular Expressions and N
Page 98 and 99: Figure 5: Evaluation Rules Evaluati
Page 100 and 101: 5. Exact Substitution: E, (x = v :
Page 102 and 103: Figure 9: Subtyping Algorithm Algor
Page 104 and 105: for most type variables, and that m
Page 106 and 107: 2. miniKanren Overview This section
Page 108 and 109: 3. Pseudo-Variadic Relations Just a
Page 110 and 111:
Replacing run 10 with run ∗ cause
Page 112 and 113:
As might be expected, we could use
Page 114 and 115:
Of course there are still infinitel
Page 116 and 117:
To ensure that streams produced by
Page 118 and 119:
118 Scheme and Functional Programmi
Page 120 and 121:
On the other hand, we can use a hig
Page 122 and 123:
(ev* (Q (lambda (x) (+ x 1)))) # >
Page 124 and 125:
(term ’lam (lambda (x) (if (equal
Page 126 and 127:
a message: there is no guarantee th
Page 128 and 129:
(! (self) 3) (?) =⇒ 1 (?? odd?) =
Page 130 and 131:
(define new-server (spawn (lambda (
Page 132 and 133:
The abstraction shown in this secti
Page 134 and 135:
Erlang Termite List length (µs) (
Page 136 and 137:
Page 138 and 139:
page of j contains the URL of the p
Page 140 and 141:
4. Solution The failed attempts abo
Page 142 and 143:
· ; · ; · :: Store × Frame Stac
Page 144 and 145:
logically creates a sub-session of
Page 146 and 147:
on this work, including Ryan Culpep
Page 148 and 149:
ing application operation. 1 As a r
Page 150 and 151:
such as image glyphs corresponding
Page 152 and 153:
Phone For clarity, this code pr
Page 154 and 155:
A. Porting TinyScheme to Qualcomm B
Page 156 and 157:
Page 158 and 159:
ware installers have to install any
Page 160 and 161:
defines a module named circle-lib i
Page 162 and 163:
Figure 2. Sometimes special cases a
Page 164 and 165:
Considering all of these issues tog
show all

2006 Scheme and Functional Programming Papers, University of

Create successful ePaper yourself

Delete template?

Save as template?