13.11.2014 Views

Introduction to Computational Linguistics

Introduction to Computational Linguistics

Introduction to Computational Linguistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

16. Finite State Morphology 60<br />

(‘for a drum’) dobbal (‘with a drum’); szeg (‘nail’) szegnek (‘for a nail’) and<br />

szeggel (‘with a nail’).) On the other hand, there are a handful of words that end<br />

in z (ez ‘this’, az ‘that’) where the final z assimilates <strong>to</strong> the following consonant<br />

(ennek ‘<strong>to</strong> this’), except in the comitative where we have ezzel ‘with this’. To<br />

write a finite state transducer, we need <strong>to</strong> record in the state two things: whether<br />

or not the root contained a back vowel, and what consonant the root ended in.<br />

Plural in German is a jungle. First, there are many ways in which the plural<br />

can be formed: suffix s, suffix en, suffix er, Umlaut, which is the change (graphically)<br />

from a <strong>to</strong> ä, from o <strong>to</strong> ö and from u <strong>to</strong> ü of the last vowel; and combinations<br />

thereof. Second, there is no way <strong>to</strong> predict phonologically which word will take<br />

which plural. Hence, we have <strong>to</strong> be content with a word list. This means, translated<br />

in<strong>to</strong> finite state machine, that we end up with a machine of several hundreds<br />

of states.<br />

Another area where a transducer is useful is in writing conventions. In English,<br />

final y changes <strong>to</strong> i when a vowel is added: happy : happier, fly : flies. In<br />

Hungarian, the palatal sound [dj] is written gy. When this sound is doubled it<br />

becomes ggy and not, as one would expect, gygy. The word hegy should be<br />

the above rule become hegygyel, but the orthography dictates heggyel. (Actually,<br />

the spelling gets undone in hyphenation: you write hegy-gyel.)<br />

Thus, the following procedure suggests itself: we define a machine that regularizes<br />

the orthography by reversing the conventions as just shown. This machine<br />

translates heggyel in<strong>to</strong> hegygyel. Actually, it is not necessary that gy is treated<br />

as a digraph. We can define a new alphabet in which gy is written by a single<br />

symbol. Next, we take this as input <strong>to</strong> a second machine which produces the deep<br />

morphological representations.<br />

We close with an example from Egyptian Arabic. Like in many semitic languages,<br />

roots only consist of consonants. Typically, they have three consonants,<br />

for example ktb ‘<strong>to</strong> write’ and drs ‘<strong>to</strong> study’. To words are made by adding some<br />

material in front (prefixation), some material after (suffixation) and some material<br />

in between (infixation). Moreover, all these typically happen at the same time.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!