Introduction to Computational Linguistics
Introduction to Computational Linguistics
Introduction to Computational Linguistics
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
23. Some Metatheorems 92<br />
exists an accepting run for every string of the form ⃗u⃗v q ⃗w. For example,<br />
(224)<br />
(225)<br />
(226)<br />
(227)<br />
q 0<br />
⃗u<br />
→ qi = q j<br />
⃗w<br />
→ qn<br />
q 0<br />
⃗u<br />
→ qi<br />
⃗v<br />
→ q j = q i<br />
⃗w<br />
→ xn<br />
⃗u ⃗v<br />
⃗v<br />
q 0 → qi → q j = q i → q j<br />
q 0<br />
⃗u<br />
→ qi<br />
⃗v<br />
→ q j = q i<br />
⃗w<br />
→ xn<br />
⃗v<br />
→ q j = q i<br />
⃗v<br />
→ q j = q i<br />
⃗w<br />
→ xn<br />
There are examples of this kind in natural languages. An amusing example is from<br />
Steven Pinker. In the days of the arms race one produced not only missiles<br />
but also anti missile missile, <strong>to</strong> be used against missiles; and then anti<br />
anti missile missile missiles <strong>to</strong> attach anti missile missiles. And <strong>to</strong> attack<br />
those, one needed anti anti anti missile missile missile missiles.<br />
And so on. The general recipe for these expressions is as follows:<br />
(228) (anti ⌢ □) n (missile ⌢ □) n⌢ missile<br />
We shall show that this is not a regular language. Suppose it is regular. Then there<br />
is a k satisfying the conditions above. Now take the word (anti ⌢ □) k (missile ⌢ □) k⌢ missile.<br />
There must be a subword of nonempty length that can be omitted or repeated without<br />
punishment. No such word exists: let us break the original string in<strong>to</strong> a prefix<br />
(anti ⌢ □) k of length 5k and a suffix (missile ⌢ □) k⌢ missile of length 8(k+1)−1.<br />
The entire string has length 13k + 7. The string we take out must therefore have<br />
length 13n. We assume for simplicity that n = 1; the general argument is similar.<br />
It is easy <strong>to</strong> see that it must contain the letters of anti missile. Suppose we<br />
decompose the original string as follows:<br />
(229) ⃗u = (anti ⌢ □) k−1 ,⃗v = anti missile ⌢ □,<br />
Then ⃗u⃗w is of the required form. Unfortunately,<br />
⃗w = (missile ⌢ □) k−1⌢ missile<br />
(230) ⃗u⃗v 2 ⃗w = (anti ⌢ □) k−n⌢ anti missile anti missile ⌢ □ ⌢<br />
Similarly for any other attempt <strong>to</strong> divide the string.<br />
(missile ⌢ □) k−2⌢ missile L