06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

100<br />

90<br />

80<br />

Unlabelled attachment<br />

70<br />

60<br />

50<br />

40<br />

30<br />

no-sv, with words<br />

20<br />

no-sv, no words<br />

no-da, with words<br />

10<br />

no-da, no words<br />

0<br />

no skyl<strong>in</strong>e<br />

10 20 50 100 200 500 1k 2k 5k 10k<br />

Sentences<br />

Figure 4: Learn<strong>in</strong>g curves for parsers us<strong>in</strong>g the Basic strategy<br />

3.4 Conversion<br />

As outl<strong>in</strong>ed <strong>in</strong> Section 2 there are a number <strong>of</strong> differences <strong>in</strong> terms <strong>of</strong> annotation<br />

between our three corpora, <strong>in</strong> particular between the DDT and Talbanken/NDT.<br />

In this section we present a set <strong>of</strong> more specific conversion procedures we use<br />

to move our source corpora closer to the target corpus. Previous work <strong>in</strong> crossl<strong>in</strong>gual<br />

parser adaptation (Zeman and Resnik [12], Søgaard [9]) has considered<br />

only unlabelled pars<strong>in</strong>g, but as we will see <strong>in</strong> Section 3.4.3, it is entirely possible<br />

to make a labell<strong>in</strong>g parser.<br />

3.4.1 Part-<strong>of</strong>-speech conversion<br />

Until now, only the bare m<strong>in</strong>imum <strong>of</strong> conversion and mapp<strong>in</strong>g has been applied<br />

to the source corpora. But many <strong>of</strong> the differences <strong>in</strong> annotation strategy between<br />

the source and target corpora are quite simple and easy to recover, and given the<br />

close connection between the languages, it is not hard to do a more targeted PoS<br />

tag mapp<strong>in</strong>g than what <strong>in</strong>terset provides.<br />

Specifically, we can convert both Talbanken and DDT’s PoS tagsets <strong>in</strong>to the<br />

tagset used by the NDT. For the most part, this is a simple matter <strong>of</strong> writ<strong>in</strong>g down a<br />

look-up table mapp<strong>in</strong>g the tags <strong>in</strong> the source tagset to the correspond<strong>in</strong>g tag <strong>in</strong> the<br />

target tagset. Roughly 90% <strong>of</strong> both the Swedish and Danish tags can be converted<br />

193

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!