13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

16. Finite State Morphology 59<br />

Then it acts as a map from surface forms <strong>to</strong> deep forms. It will translate arm in<strong>to</strong><br />

armR and arms in<strong>to</strong> armS and armsR. The latter may be surprising, but the<br />

machine has no idea about the lexicon of English. It assumes that s can be either<br />

the sign of plural or the last letter of the root. Both cases arise. For example,<br />

the word bus has as its last letter indeed s. Thus, in one direction the machine<br />

synthesizes surface forms, and in the other direction it analyses them.<br />

Now, let us make the machine more sophisticated. The regular plural is formed<br />

by adding es, not just s, when the word ends in sh: bushes, splashes. If the<br />

word ends in s then the plural is obtained by adding ses: busses, plusses. We<br />

can account for this as follows. The machine will take the input and end in three<br />

different states, according <strong>to</strong> whether the word ends in s, sh or something else.<br />

(160)<br />

〈0, a, 0, a〉, 〈0, b, 0, b〉, . . . , 〈0, z, 0, z〉,<br />

〈0, a, 4, a〉, . . . , 〈0, r, 4, r〉, 〈0, t, 4, t〉,<br />

. . . , 〈0, z, 4, z〉, 〈0, s, 2, s〉, 〈2, a, 4, a〉,<br />

. . . , 〈2, g, 4, g〉, 〈2, h, 3, h〉, 〈2, i, 4, i〉,<br />

. . . , 〈2, z, 4, z〉, 〈2, ε, 3, s〉, 〈3, ε, 4, e〉,<br />

〈4, R, 1, ε〉, 〈4, S, 1, s〉.<br />

This does not exhaust the actual spelling rules for English, but it should suffice.<br />

Notice that the machine, when turned around, will analyze busses correctly as<br />

busS, and also as bussesR. Once again, the mistake is due <strong>to</strong> the fact that<br />

the machine does not know that busses is no basic word of English. Suppose<br />

we want <strong>to</strong> implement that kind of knowledge in<strong>to</strong> the machine. Then what we<br />

would have <strong>to</strong> do is write a machine that can distinguish an English word from<br />

a nonword. Such a machine probably requires very many states. It is probably<br />

no exaggeration <strong>to</strong> say that several hundreds of states will be required. This is<br />

certainly the case if we take in<strong>to</strong> account that certain nouns form the plural differently:<br />

we only mention formulae (from formula, indices (from index),<br />

tableaux (from tableau), men, children, oxen, sheep, mice.<br />

Here is another task. In Hungarian, case suffixes come in different forms. For<br />

example, the dative is formed by adding nak or nek. The form depends on the<br />

following fac<strong>to</strong>rs. If the root contains a back vowel (a, o, u) then the suffix is nak;<br />

otherwise it is nek. The comitative suffix is another special case: when added, it<br />

becomes a sequence of consonant plus al or el (the choice of vowel depends in<br />

the same way as that of nak versus nek). The consonant is v if the root ends in a<br />

vowel; otherwise it is the same as the preceding one. (So: dob (‘drum’) dobnak

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!