A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
viant end<strong>in</strong>g, the correct base form (lemma) is entered, and a shortcut br<strong>in</strong>gs up the<br />
paradigm(s) associated with the base form (the field Is a variant <strong>of</strong> <strong>in</strong> figure 4). The<br />
paradigm is then presented as a list <strong>of</strong> all exist<strong>in</strong>g word forms with their morphological<br />
features, from which the annotator selects one or more paradigm rows with<br />
the appropriate features. If there is no such set <strong>of</strong> features, as will be the case when<br />
a word is used with deviat<strong>in</strong>g gender, the features must be typed <strong>in</strong> manually <strong>in</strong> the<br />
Features field. In the example <strong>in</strong> figure 4, the word kjelleren ‘the basement’ has<br />
been spelled with an apostrophe rather than the vowel e, represent<strong>in</strong>g a common<br />
pronunciation where the schwa <strong>in</strong> the f<strong>in</strong>al syllable has been dropped.<br />
When the variation concerns the spell<strong>in</strong>g <strong>of</strong> the stem, an entire paradigm is<br />
added to the morphology. In figure 5, the misspell<strong>in</strong>g kolapsa is be<strong>in</strong>g added to the<br />
paradigm for kollapse ‘(to) collapse’. This error is made systematically throughout<br />
the text and is probably <strong>in</strong>tentional (not a typo), and it is also likely to be a common<br />
mistake. The word is added to the morphology by enter<strong>in</strong>g the base form <strong>of</strong> the<br />
variant, kolapse, <strong>in</strong> the Base form field, and then typ<strong>in</strong>g <strong>in</strong> the base form <strong>of</strong> the<br />
standard (Add to base form). All possible paradigms appear <strong>in</strong> the box to the right<br />
(<strong>in</strong> this particular case only one) and the appropriate paradigm is chosen.<br />
All extracted words are stored <strong>in</strong> a database together with their assigned lexical<br />
properties and the context they were extracted from. Here, they can be reviewed and<br />
reclassified/edited if necessary. Before the texts those words are extracted from are<br />
added to the treebank and parsed, the extracted words and their paradigms have to be<br />
added to the morphology used <strong>in</strong> the LFG grammar. S<strong>in</strong>ce this add-on morphology<br />
is not technically merged with the ma<strong>in</strong> morphology, but compiled as a separate<br />
transducer, the ma<strong>in</strong> morphological transducers do not have to be recompiled, and<br />
updat<strong>in</strong>g <strong>of</strong> the add-on morphology is done <strong>in</strong> a matter <strong>of</strong> seconds.<br />
Figure 5: Add<strong>in</strong>g stem variants