Introduction to Computational Linguistics
Introduction to Computational Linguistics
Introduction to Computational Linguistics
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
13. Complexity and Minimal Au<strong>to</strong>mata 47<br />
It is easy <strong>to</strong> check that [b] M = [a] M . We shall see that this difference means that<br />
there is an au<strong>to</strong>ma<strong>to</strong>n checking M can based on less states than any au<strong>to</strong>ma<strong>to</strong>n<br />
checking membership in L.<br />
a<br />
Given two index set I and J, put I → J if and only if J = a\I. This is<br />
well-defined. For let I = [⃗x] L . Then suppose that ⃗xa is a prefix of an accepted<br />
string. Then [⃗xa] L = {⃗y : ⃗xa⃗y ∈ L} = {⃗y : a⃗y ∈ I} = a\I. This defines a<br />
deterministic au<strong>to</strong>ma<strong>to</strong>n with initial element L. Accepting sets are those which<br />
contain ε. We call this the index au<strong>to</strong>ma<strong>to</strong>n and denote it by I(L). (Often it is<br />
called the Myhill-Nerode au<strong>to</strong>ma<strong>to</strong>n.)<br />
Theorem 15 (Myhill-Nerode) L(I(L)) = L.<br />
Proof. By induction on ⃗x we show that L ⃗x → I if and only if [⃗x] L = I. If ⃗x = ε<br />
the claim reads L = L if and only if [ε] L = L. But [ε] L = L, so the claim holds.<br />
Next, let ⃗x = ⃗ya. By induction hypothesis, L ⃗y → J if and only if [⃗y] L = I. Now,<br />
J a → J/a = [⃗ya] L . So, L ⃗x → J/a = [⃗x] L , as promised.<br />
Now, ⃗x is accepted by I(L) if and only if there is a computation from L <strong>to</strong> a set<br />
[⃗y] L containing ε. By the above this is equivalent <strong>to</strong> ε ∈ [⃗x] L , which means ⃗x ∈ L.<br />
□<br />
Given an au<strong>to</strong>ma<strong>to</strong>n A and a state q put<br />
(129) [q] := {⃗x : there is q ′ ∈ F: q ⃗x → q ′ }<br />
It is easy <strong>to</strong> see that for every q there is a string ⃗x such that [q] ⊆ [⃗x] L . Namely, let<br />
⃗x<br />
⃗x be such that i 0 → q. Then for all ⃗y ∈ [q], ⃗x⃗y ∈ L(A), by definition of [q]. Hence<br />
[q] ⊆ [⃗x] L . Conversely, for every [⃗x] L there must be a state q such that [q] ⊆ [⃗x] L .<br />
⃗x<br />
Again, q is found as a state such that i 0 → q. Suppose now that A is deterministic<br />
and <strong>to</strong>tal. Then for each string ⃗x there is exactly one state [q] such that [q] ⊆ [⃗x] L .<br />
⃗x⃗y<br />
Then obviously [q] = [⃗x] L . For if ⃗y ∈ [⃗x] L then ⃗x⃗y ∈ L, whence i 0 −→ q ′ ∈ F for<br />
some q ′ ⃗x<br />
. Since the au<strong>to</strong>ma<strong>to</strong>n is deterministic, i 0 → q → ⃗y q ′ , whence ⃗y ∈ [q].<br />
It follows now that the index au<strong>to</strong>ma<strong>to</strong>n is the smallest deterministic and <strong>to</strong>tal<br />
au<strong>to</strong>ma<strong>to</strong>n that recognizes the language. The next question is: how do we make<br />
that au<strong>to</strong>ma<strong>to</strong>n? There are two procedures; one starts from a given au<strong>to</strong>ma<strong>to</strong>n,