Introduction to Computational Linguistics

More documents

Recommendations

Info

12. Finite State Automata 42 Next, assume that s = s 1 ∪ s 2 and let t ∈ s † . Either t ∈ s † 1 or t ∈ s† 2 . In the first case, L(t) ⊆ L(s 1 ) p by inductive hypothesis, and so L(t) ⊆ L(s) p , since L(s) ⊇ L(s 1 ) and so L(s) p ⊇ L(s 1 ) p . Analogously for the second case. Now, if ⃗x ∈ L(s) p then either ⃗x ∈ L(s 1 ) p or L(s 2 ) p and hence by inductive hypothesis ⃗x ∈ u for some u ∈ s † 1 or some u ∈ s† 2 , so u ∈ s† , as had to be shown. Next let s = s 1 · s 2 and t ∈ s † . Case 1. t ∈ s † 1 . Then by inductive hypothesis, L(t) ⊆ L(s 1 ) p , and since L(s 1 ) p ⊆ L(s) p (can yous ee this?), we get the desired conclusion. Case 2. t = s · u with u ∈ L(s 2 ) † . Now, a string in L(t) is of the form ⃗x⃗y, where ⃗x ∈ L(s 1 ) and ⃗y ∈ L(u) ⊆ L(s 2 ) p , by induction hypothesis. So, ⃗x ∈ L(s 1 )L(s 2 ) p ⊆ L(s) p , as had to be shown. Conversely, if a string is in L(s) p then either it is of the form ⃗x ∈ L(s 1 ) p or ⃗y⃗z, with ⃗y ∈ L(s 1 ) and ⃗z ∈ L(s 2 ) p . In the first case there is a u ∈ s † 1 such that ⃗x ∈ L(u); in the second case there is a u ∈ s † 2 such that ⃗z ∈ L(u) and ⃗y ∈ L(s 1). In the first case u ∈ s † ; in the second case, ⃗x ∈ L(s 1 u) and s 1 u ∈ s † . This shows the claim. Finally, let s = s ∗ 1 . Then either t ∈ s† 1 or t = s∗ 1 u where u ∈ s† 1 . In the first case we have L(t) ⊆ L(s 1 ) p ⊆ L(s) p , by inductive hypothesis. In the second case we have L(t) ⊆ L(s ∗ 1 )L(u) ⊆ L(s∗ 1 )L(s 1) p ⊆ L(s) p . This finishes one direction. Now, suppose that ⃗x ∈ L(s). Then there are ⃗y i , i < n, and ⃗z such that ⃗y i ∈ L(s 1 ) for all i < n, and ⃗z ∈ L(s 1 ) p . By inductive hypothesis there is a u ∈ L(s 1 ) such that ⃗z ∈ L(u). Also ⃗y 0 ⃗y 1 · · ·⃗y n−1 ∈ L(s ∗ 1 ), so ⃗x ∈ L(s∗ 1 u), and the regular term is in s† . □ We remark here that for all s ∅: ε ∈ s † ; moreover, s ∈ s † . Both are easily established by induction. Let s be a term. For t, u ∈ s † put t L(ta) ⊆ L(u).) The start symbol is ε. a → u iff ta ⊆ u. (The latter means (122) A(s) := {a ∈ A : a ∈ s † } δ(t, a) := {u : ta ⊆ u} A(s) := 〈A(s), s † , ε, δ〉 canonical automaton of s. We shall show that the language recognized by the canonical automaton is s. This follows immediately from the next theorem, which establishes a somewhat more general result. Lemma 9 In A(s), ε ⃗x → t iff ⃗x ∈ L(t). Hence, the language accepted by state t is exactly L(t).
12. Finite State Automata 43 Proof. First, we show that if ε → ⃗x u then ⃗x ∈ L(u). This is done by induction on the length of ⃗x. If ⃗x = ε the claim trivially follows. Now let ⃗x = ⃗ya for some a. Assume that ε → ⃗x u. Then there is t such that ε → ⃗y t → u. By inductive assumption, ⃗y ∈ L(t), and by definition of the transition relation, L(t)a ⊆ L(u). Whence the claim follows. Now for the converse, the claim that if ⃗x ∈ L(u) then ε ⃗x → u. Again we show this by induction on the length of ⃗x. If ⃗x = ε we are done. Now let ⃗x = ⃗ya for some a ∈ A. We have to show that there is a t ∈ s † such that ta ⊆ u. For then by inductive assumption, ε ⃗y → t, and so ε ⃗ya → u, by definition of the transition relation. Now for the remaining claim: if ⃗ya ∈ L(u) then there is a t ∈ s † such that ⃗y ∈ L(t) and L(ta) ⊆ L(u). Again induction this time on u. Notice right away that u † ⊆ s † , a fact that will become useful. Case 1. u = b for some letter. Clearly, ε ∈ s † , and putting t := ε will do. Case 2. u = u 1 ∪ u 2 . Then u 1 and u 2 are both in s † . Now, suppose ⃗ya ∈ L(u 1 ). By inductive hypothesis, there is t such that L(ta) ⊆ L(u 1 ) ⊆ L(u), so the claim follows. Similarly if ⃗ya ∈ u 2 . Case 3. u = u 1 u 2 . Subcase 2a. ⃗y = ⃗y 1 ⃗y 2 , with ⃗y 1 ∈ L(u 1 ) and ⃗y 2 ∈ L(u 2 ). Now, by inductive hypothesis, there is a t 2 such that L(t 2 a) ⊆ L(u 2 ). Then t := u 1 t 2 is the desired term. Since it is in u † , it is also in s † . And L(ta) = L(u 1 t 2 a) ⊆ L(u 1 u 2 ) = L(u). Case 4. u = u ∗ 1 . Suppose ⃗ya ∈ L(u). Then ⃗y has a decomposition ⃗z 0⃗z 1 · · ·⃗z n−1 ⃗v such that ⃗z i ∈ L(u 1 ) for all i < n, and also ⃗va is in L(u 1 ). By inductive hypothesis, there is a t 1 such that L(t 1 a) ⊆ L(u 1 ), and ⃗v ∈ L(t 1 ). And ⃗z 0 ⃗z 1 · · ·⃗z n−1 ∈ L(u). Now put t := ut 1 . This has the desired properties. □ Now, as a consequence, if L is regular, so is L p . Namely, take the automaton A(s), where s is a regular term of L. Now change this automaton to make every state accepting. This defines a new automaton which accepts every string that falls under some t ∈ s † , by the previous results. Hence, it accepts all prefixes of string from L. We discuss an application. Syllables of a given language are subject to certain conditions. One of the most famous constraints (presumed to be universal) is the sonoricity hierarchy. It states that the sonoricity of phonemes in a syllable must rise until the nucleus, and then fall. The nucleus contains the sounds of highest sonoricity (which do not have to be vowels). The rising part is called the onset and the falling part the rhyme. The sonoricity hierarchy translates into a finite state automaton as follows. It has states 〈o, i〉, and 〈r, i〉, where i is a number smaller a
Page 1 and 2: Introduction to Computational Lingu
Page 3 and 4: 2. Practical Remarks Concerning OCa
Page 5 and 6: 3. Welcome To The Typed Universe 5
Page 11 and 12: 4. Function Definitions 11 4 Functi
Page 13 and 14: 5. Modules 13 can find out why the
Page 15 and 16: 5. Modules 15 let length l = length
Page 17 and 18: 6. Sets and Functors 17 write it do
Page 19 and 20: 7. Hash Tables 19 to actually see t
Page 21 and 22: 8. Combinators 21 can be used on an
Page 23 and 24: 9. Objects and Methods 23 of type
Page 25 and 26: 10. Characters, Strings and Regular
Page 31 and 32: 11. Interlude: Regular Expressions
Page 37 and 38: 12. Finite State Automata 37 Suppos
Page 39 and 40: 12. Finite State Automata 39 There
Page 41: 12. Finite State Automata 41 The pr
Page 45 and 46: 13. Complexity and Minimal Automata
Page 51 and 52: 14. Digression: Time Complexity 51
Page 53 and 54: 14. Digression: Time Complexity 53
Page 55 and 56: 15. Finite State Transducers 55 Con
Page 57 and 58: 15. Finite State Transducers 57 And
Page 59 and 60: 16. Finite State Morphology 59 Then
Page 61 and 62: 17. Using Finite State Transducers
Page 67 and 68: 18. Context Free Grammars 67 18 Con
Page 69 and 70: 18. Context Free Grammars 69 symbol
Page 71 and 72: 18. Context Free Grammars 71 which
Page 73 and 74: 19. Parsing and Recognition 73 Z
Page 75 and 76: 19. Parsing and Recognition 75 numb
Page 77 and 78: 20. Greibach Normal Form 77 The red
Page 79 and 80: 20. Greibach Normal Form 79 Definit
Page 81 and 82: 20. Greibach Normal Form 81 Proposi
Page 83 and 84: 20. Greibach Normal Form 83 nonterm
Page 85 and 86: 21. Pushdown Automata 85 pushdown.
Page 87 and 88: 21. Pushdown Automata 87 machine is
Page 89 and 90: 22. Shift-Reduce-Parsing 89 differe
Page 91 and 92: 23. Some Metatheorems 91 If the loo
Page 93 and 94:
23. Some Metatheorems 93 This proof
Page 95 and 96:
23. Some Metatheorems 95 a decompos
show all

Introduction to Computational Linguistics

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?