13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

12. Finite State Au<strong>to</strong>mata 43<br />

Proof. First, we show that if ε → ⃗x u then ⃗x ∈ L(u). This is done by induction on<br />

the length of ⃗x. If ⃗x = ε the claim trivially follows. Now let ⃗x = ⃗ya for some<br />

a. Assume that ε → ⃗x u. Then there is t such that ε → ⃗y t → u. By inductive<br />

assumption, ⃗y ∈ L(t), and by definition of the transition relation, L(t)a ⊆ L(u).<br />

Whence the claim follows. Now for the converse, the claim that if ⃗x ∈ L(u) then<br />

ε ⃗x → u. Again we show this by induction on the length of ⃗x. If ⃗x = ε we are done.<br />

Now let ⃗x = ⃗ya for some a ∈ A. We have <strong>to</strong> show that there is a t ∈ s † such that<br />

ta ⊆ u. For then by inductive assumption, ε ⃗y → t, and so ε ⃗ya → u, by definition of<br />

the transition relation.<br />

Now for the remaining claim: if ⃗ya ∈ L(u) then there is a t ∈ s † such that<br />

⃗y ∈ L(t) and L(ta) ⊆ L(u). Again induction this time on u. Notice right away that<br />

u † ⊆ s † , a fact that will become useful. Case 1. u = b for some letter. Clearly,<br />

ε ∈ s † , and putting t := ε will do. Case 2. u = u 1 ∪ u 2 . Then u 1 and u 2 are<br />

both in s † . Now, suppose ⃗ya ∈ L(u 1 ). By inductive hypothesis, there is t such<br />

that L(ta) ⊆ L(u 1 ) ⊆ L(u), so the claim follows. Similarly if ⃗ya ∈ u 2 . Case 3.<br />

u = u 1 u 2 . Subcase 2a. ⃗y = ⃗y 1 ⃗y 2 , with ⃗y 1 ∈ L(u 1 ) and ⃗y 2 ∈ L(u 2 ). Now, by inductive<br />

hypothesis, there is a t 2 such that L(t 2 a) ⊆ L(u 2 ). Then t := u 1 t 2 is the desired<br />

term. Since it is in u † , it is also in s † . And L(ta) = L(u 1 t 2 a) ⊆ L(u 1 u 2 ) = L(u).<br />

Case 4. u = u ∗ 1 . Suppose ⃗ya ∈ L(u). Then ⃗y has a decomposition ⃗z 0⃗z 1 · · ·⃗z n−1 ⃗v such<br />

that ⃗z i ∈ L(u 1 ) for all i < n, and also ⃗va is in L(u 1 ). By inductive hypothesis, there<br />

is a t 1 such that L(t 1 a) ⊆ L(u 1 ), and ⃗v ∈ L(t 1 ). And ⃗z 0 ⃗z 1 · · ·⃗z n−1 ∈ L(u). Now put<br />

t := ut 1 . This has the desired properties.<br />

□<br />

Now, as a consequence, if L is regular, so is L p . Namely, take the au<strong>to</strong>ma<strong>to</strong>n<br />

A(s), where s is a regular term of L. Now change this au<strong>to</strong>ma<strong>to</strong>n <strong>to</strong> make every<br />

state accepting. This defines a new au<strong>to</strong>ma<strong>to</strong>n which accepts every string that falls<br />

under some t ∈ s † , by the previous results. Hence, it accepts all prefixes of string<br />

from L.<br />

We discuss an application. Syllables of a given language are subject <strong>to</strong> certain<br />

conditions. One of the most famous constraints (presumed <strong>to</strong> be universal) is the<br />

sonoricity hierarchy. It states that the sonoricity of phonemes in a syllable must<br />

rise until the nucleus, and then fall. The nucleus contains the sounds of highest<br />

sonoricity (which do not have <strong>to</strong> be vowels). The rising part is called the onset and<br />

the falling part the rhyme. The sonoricity hierarchy translates in<strong>to</strong> a finite state<br />

au<strong>to</strong>ma<strong>to</strong>n as follows. It has states 〈o, i〉, and 〈r, i〉, where i is a number smaller<br />

a

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!