13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

23. Some Metatheorems 92<br />

exists an accepting run for every string of the form ⃗u⃗v q ⃗w. For example,<br />

(224)<br />

(225)<br />

(226)<br />

(227)<br />

q 0<br />

⃗u<br />

→ qi = q j<br />

⃗w<br />

→ qn<br />

q 0<br />

⃗u<br />

→ qi<br />

⃗v<br />

→ q j = q i<br />

⃗w<br />

→ xn<br />

⃗u ⃗v<br />

⃗v<br />

q 0 → qi → q j = q i → q j<br />

q 0<br />

⃗u<br />

→ qi<br />

⃗v<br />

→ q j = q i<br />

⃗w<br />

→ xn<br />

⃗v<br />

→ q j = q i<br />

⃗v<br />

→ q j = q i<br />

⃗w<br />

→ xn<br />

There are examples of this kind in natural languages. An amusing example is from<br />

Steven Pinker. In the days of the arms race one produced not only missiles<br />

but also anti missile missile, <strong>to</strong> be used against missiles; and then anti<br />

anti missile missile missiles <strong>to</strong> attach anti missile missiles. And <strong>to</strong> attack<br />

those, one needed anti anti anti missile missile missile missiles.<br />

And so on. The general recipe for these expressions is as follows:<br />

(228) (anti ⌢ □) n (missile ⌢ □) n⌢ missile<br />

We shall show that this is not a regular language. Suppose it is regular. Then there<br />

is a k satisfying the conditions above. Now take the word (anti ⌢ □) k (missile ⌢ □) k⌢ missile.<br />

There must be a subword of nonempty length that can be omitted or repeated without<br />

punishment. No such word exists: let us break the original string in<strong>to</strong> a prefix<br />

(anti ⌢ □) k of length 5k and a suffix (missile ⌢ □) k⌢ missile of length 8(k+1)−1.<br />

The entire string has length 13k + 7. The string we take out must therefore have<br />

length 13n. We assume for simplicity that n = 1; the general argument is similar.<br />

It is easy <strong>to</strong> see that it must contain the letters of anti missile. Suppose we<br />

decompose the original string as follows:<br />

(229) ⃗u = (anti ⌢ □) k−1 ,⃗v = anti missile ⌢ □,<br />

Then ⃗u⃗w is of the required form. Unfortunately,<br />

⃗w = (missile ⌢ □) k−1⌢ missile<br />

(230) ⃗u⃗v 2 ⃗w = (anti ⌢ □) k−n⌢ anti missile anti missile ⌢ □ ⌢<br />

Similarly for any other attempt <strong>to</strong> divide the string.<br />

(missile ⌢ □) k−2⌢ missile L

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!