A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
100<br />
90<br />
80<br />
Unlabelled attachment<br />
70<br />
60<br />
50<br />
40<br />
30<br />
no-sv, with words<br />
20<br />
no-sv, no words<br />
no-da, with words<br />
10<br />
no-da, no words<br />
0<br />
no skyl<strong>in</strong>e<br />
10 20 50 100 200 500 1k 2k 5k 10k<br />
Sentences<br />
Figure 4: Learn<strong>in</strong>g curves for parsers us<strong>in</strong>g the Basic strategy<br />
3.4 Conversion<br />
As outl<strong>in</strong>ed <strong>in</strong> Section 2 there are a number <strong>of</strong> differences <strong>in</strong> terms <strong>of</strong> annotation<br />
between our three corpora, <strong>in</strong> particular between the DDT and Talbanken/NDT.<br />
In this section we present a set <strong>of</strong> more specific conversion procedures we use<br />
to move our source corpora closer to the target corpus. Previous work <strong>in</strong> crossl<strong>in</strong>gual<br />
parser adaptation (Zeman and Resnik [12], Søgaard [9]) has considered<br />
only unlabelled pars<strong>in</strong>g, but as we will see <strong>in</strong> Section 3.4.3, it is entirely possible<br />
to make a labell<strong>in</strong>g parser.<br />
3.4.1 Part-<strong>of</strong>-speech conversion<br />
Until now, only the bare m<strong>in</strong>imum <strong>of</strong> conversion and mapp<strong>in</strong>g has been applied<br />
to the source corpora. But many <strong>of</strong> the differences <strong>in</strong> annotation strategy between<br />
the source and target corpora are quite simple and easy to recover, and given the<br />
close connection between the languages, it is not hard to do a more targeted PoS<br />
tag mapp<strong>in</strong>g than what <strong>in</strong>terset provides.<br />
Specifically, we can convert both Talbanken and DDT’s PoS tagsets <strong>in</strong>to the<br />
tagset used by the NDT. For the most part, this is a simple matter <strong>of</strong> writ<strong>in</strong>g down a<br />
look-up table mapp<strong>in</strong>g the tags <strong>in</strong> the source tagset to the correspond<strong>in</strong>g tag <strong>in</strong> the<br />
target tagset. Roughly 90% <strong>of</strong> both the Swedish and Danish tags can be converted<br />
193