06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

viant end<strong>in</strong>g, the correct base form (lemma) is entered, and a shortcut br<strong>in</strong>gs up the<br />

paradigm(s) associated with the base form (the field Is a variant <strong>of</strong> <strong>in</strong> figure 4). The<br />

paradigm is then presented as a list <strong>of</strong> all exist<strong>in</strong>g word forms with their morphological<br />

features, from which the annotator selects one or more paradigm rows with<br />

the appropriate features. If there is no such set <strong>of</strong> features, as will be the case when<br />

a word is used with deviat<strong>in</strong>g gender, the features must be typed <strong>in</strong> manually <strong>in</strong> the<br />

Features field. In the example <strong>in</strong> figure 4, the word kjelleren ‘the basement’ has<br />

been spelled with an apostrophe rather than the vowel e, represent<strong>in</strong>g a common<br />

pronunciation where the schwa <strong>in</strong> the f<strong>in</strong>al syllable has been dropped.<br />

When the variation concerns the spell<strong>in</strong>g <strong>of</strong> the stem, an entire paradigm is<br />

added to the morphology. In figure 5, the misspell<strong>in</strong>g kolapsa is be<strong>in</strong>g added to the<br />

paradigm for kollapse ‘(to) collapse’. This error is made systematically throughout<br />

the text and is probably <strong>in</strong>tentional (not a typo), and it is also likely to be a common<br />

mistake. The word is added to the morphology by enter<strong>in</strong>g the base form <strong>of</strong> the<br />

variant, kolapse, <strong>in</strong> the Base form field, and then typ<strong>in</strong>g <strong>in</strong> the base form <strong>of</strong> the<br />

standard (Add to base form). All possible paradigms appear <strong>in</strong> the box to the right<br />

(<strong>in</strong> this particular case only one) and the appropriate paradigm is chosen.<br />

All extracted words are stored <strong>in</strong> a database together with their assigned lexical<br />

properties and the context they were extracted from. Here, they can be reviewed and<br />

reclassified/edited if necessary. Before the texts those words are extracted from are<br />

added to the treebank and parsed, the extracted words and their paradigms have to be<br />

added to the morphology used <strong>in</strong> the LFG grammar. S<strong>in</strong>ce this add-on morphology<br />

is not technically merged with the ma<strong>in</strong> morphology, but compiled as a separate<br />

transducer, the ma<strong>in</strong> morphological transducers do not have to be recompiled, and<br />

updat<strong>in</strong>g <strong>of</strong> the add-on morphology is done <strong>in</strong> a matter <strong>of</strong> seconds.<br />

Figure 5: Add<strong>in</strong>g stem variants

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!