A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
process<strong>in</strong>g. The Add word function allows the annotator to type <strong>in</strong> words that the<br />
system has missed and add them to the list <strong>of</strong> unknown words. The same functionality<br />
for add<strong>in</strong>g words to the morphological component has also been implemented<br />
<strong>in</strong> the disambiguation <strong>in</strong>terface, so that when an error like this is discovered after<br />
pars<strong>in</strong>g, it may be added to the morphology <strong>in</strong> the same way and treated correctly<br />
<strong>in</strong> a reparse <strong>of</strong> the sentence.<br />
3.3 Assign<strong>in</strong>g lexical properties to unknown words<br />
If an unrecognized word must be added to the lexicon, the annotator has to decide<br />
whether it is a new word, to be added to the morphology as a new paradigm,<br />
or a variant form <strong>of</strong> an exist<strong>in</strong>g paradigm. Enter<strong>in</strong>g new non<strong>in</strong>flect<strong>in</strong>g words is a<br />
straightforward process <strong>in</strong>volv<strong>in</strong>g simply choos<strong>in</strong>g the correct word category, such<br />
as <strong>in</strong>terjection, place name, mascul<strong>in</strong>e first name, fem<strong>in</strong><strong>in</strong>e first name, etc.<br />
Words belong<strong>in</strong>g to the open, productive word classes (nouns, verbs, adjectives<br />
and adverbs) usually have <strong>in</strong>flectional paradigms which must be def<strong>in</strong>ed. The dictionary<br />
entry form <strong>of</strong> the word is entered <strong>in</strong> the <strong>in</strong>terface as the Base form, and a<br />
similar word with the same word class and <strong>in</strong>flection pattern is specified, either by<br />
select<strong>in</strong>g it from a drop-down list <strong>of</strong> suggestions, or by enter<strong>in</strong>g it <strong>in</strong>to a text box.<br />
The system proposes as candidates a number <strong>of</strong> word forms end<strong>in</strong>g with the same<br />
characters at the end <strong>of</strong> the word. When one <strong>of</strong> these candidates is chosen, the system<br />
automatically generates the paradigm for the word be<strong>in</strong>g entered. In figure 3,<br />
the new compound erkemikkelen ‘arch fool’ is be<strong>in</strong>g added as a paradigm <strong>in</strong>flect<strong>in</strong>g<br />
like the exist<strong>in</strong>g compound dåsemikkel ‘nitwit’, and the new paradigm is shown to<br />
the right. If the annotator decides that this is the correct <strong>in</strong>flection and chooses this<br />
paradigm, all <strong>in</strong>flected forms <strong>of</strong> the unrecognized word are automatically added to<br />
the morphological analyzer.<br />
Figure 3: Add<strong>in</strong>g a paradigm<br />
If the new word is a verb, it is not sufficient to add an <strong>in</strong>flectional paradigm.<br />
<strong>Verbs</strong> must also be assigned subcategorization frames necessary for pars<strong>in</strong>g. Sim-