Introduction to Computational Linguistics
Introduction to Computational Linguistics
Introduction to Computational Linguistics
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
16. Finite State Morphology 60<br />
(‘for a drum’) dobbal (‘with a drum’); szeg (‘nail’) szegnek (‘for a nail’) and<br />
szeggel (‘with a nail’).) On the other hand, there are a handful of words that end<br />
in z (ez ‘this’, az ‘that’) where the final z assimilates <strong>to</strong> the following consonant<br />
(ennek ‘<strong>to</strong> this’), except in the comitative where we have ezzel ‘with this’. To<br />
write a finite state transducer, we need <strong>to</strong> record in the state two things: whether<br />
or not the root contained a back vowel, and what consonant the root ended in.<br />
Plural in German is a jungle. First, there are many ways in which the plural<br />
can be formed: suffix s, suffix en, suffix er, Umlaut, which is the change (graphically)<br />
from a <strong>to</strong> ä, from o <strong>to</strong> ö and from u <strong>to</strong> ü of the last vowel; and combinations<br />
thereof. Second, there is no way <strong>to</strong> predict phonologically which word will take<br />
which plural. Hence, we have <strong>to</strong> be content with a word list. This means, translated<br />
in<strong>to</strong> finite state machine, that we end up with a machine of several hundreds<br />
of states.<br />
Another area where a transducer is useful is in writing conventions. In English,<br />
final y changes <strong>to</strong> i when a vowel is added: happy : happier, fly : flies. In<br />
Hungarian, the palatal sound [dj] is written gy. When this sound is doubled it<br />
becomes ggy and not, as one would expect, gygy. The word hegy should be<br />
the above rule become hegygyel, but the orthography dictates heggyel. (Actually,<br />
the spelling gets undone in hyphenation: you write hegy-gyel.)<br />
Thus, the following procedure suggests itself: we define a machine that regularizes<br />
the orthography by reversing the conventions as just shown. This machine<br />
translates heggyel in<strong>to</strong> hegygyel. Actually, it is not necessary that gy is treated<br />
as a digraph. We can define a new alphabet in which gy is written by a single<br />
symbol. Next, we take this as input <strong>to</strong> a second machine which produces the deep<br />
morphological representations.<br />
We close with an example from Egyptian Arabic. Like in many semitic languages,<br />
roots only consist of consonants. Typically, they have three consonants,<br />
for example ktb ‘<strong>to</strong> write’ and drs ‘<strong>to</strong> study’. To words are made by adding some<br />
material in front (prefixation), some material after (suffixation) and some material<br />
in between (infixation). Moreover, all these typically happen at the same time.