06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

process<strong>in</strong>g. The Add word function allows the annotator to type <strong>in</strong> words that the<br />

system has missed and add them to the list <strong>of</strong> unknown words. The same functionality<br />

for add<strong>in</strong>g words to the morphological component has also been implemented<br />

<strong>in</strong> the disambiguation <strong>in</strong>terface, so that when an error like this is discovered after<br />

pars<strong>in</strong>g, it may be added to the morphology <strong>in</strong> the same way and treated correctly<br />

<strong>in</strong> a reparse <strong>of</strong> the sentence.<br />

3.3 Assign<strong>in</strong>g lexical properties to unknown words<br />

If an unrecognized word must be added to the lexicon, the annotator has to decide<br />

whether it is a new word, to be added to the morphology as a new paradigm,<br />

or a variant form <strong>of</strong> an exist<strong>in</strong>g paradigm. Enter<strong>in</strong>g new non<strong>in</strong>flect<strong>in</strong>g words is a<br />

straightforward process <strong>in</strong>volv<strong>in</strong>g simply choos<strong>in</strong>g the correct word category, such<br />

as <strong>in</strong>terjection, place name, mascul<strong>in</strong>e first name, fem<strong>in</strong><strong>in</strong>e first name, etc.<br />

Words belong<strong>in</strong>g to the open, productive word classes (nouns, verbs, adjectives<br />

and adverbs) usually have <strong>in</strong>flectional paradigms which must be def<strong>in</strong>ed. The dictionary<br />

entry form <strong>of</strong> the word is entered <strong>in</strong> the <strong>in</strong>terface as the Base form, and a<br />

similar word with the same word class and <strong>in</strong>flection pattern is specified, either by<br />

select<strong>in</strong>g it from a drop-down list <strong>of</strong> suggestions, or by enter<strong>in</strong>g it <strong>in</strong>to a text box.<br />

The system proposes as candidates a number <strong>of</strong> word forms end<strong>in</strong>g with the same<br />

characters at the end <strong>of</strong> the word. When one <strong>of</strong> these candidates is chosen, the system<br />

automatically generates the paradigm for the word be<strong>in</strong>g entered. In figure 3,<br />

the new compound erkemikkelen ‘arch fool’ is be<strong>in</strong>g added as a paradigm <strong>in</strong>flect<strong>in</strong>g<br />

like the exist<strong>in</strong>g compound dåsemikkel ‘nitwit’, and the new paradigm is shown to<br />

the right. If the annotator decides that this is the correct <strong>in</strong>flection and chooses this<br />

paradigm, all <strong>in</strong>flected forms <strong>of</strong> the unrecognized word are automatically added to<br />

the morphological analyzer.<br />

Figure 3: Add<strong>in</strong>g a paradigm<br />

If the new word is a verb, it is not sufficient to add an <strong>in</strong>flectional paradigm.<br />

<strong>Verbs</strong> must also be assigned subcategorization frames necessary for pars<strong>in</strong>g. Sim-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!