06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Another part <strong>of</strong> the development was its compatibility to the<br />

TigerXML model, i.e mak<strong>in</strong>g sure that exist<strong>in</strong>g data <strong>in</strong> TigerXML can be<br />

converted to without loss <strong>of</strong> <strong>in</strong>formation. To test this, we<br />

implemented a mapp<strong>in</strong>g from TigerXML to the API. This allows us<br />

to import already exist<strong>in</strong>g data <strong>in</strong> TigerXML <strong>in</strong>to the API and export it to<br />

, and vice versa, though this may lead to <strong>in</strong>formation losses <strong>in</strong> some<br />

cases. Test<strong>in</strong>g native TigerXML files and convert<strong>in</strong>g them back and forth, we<br />

can ensure no <strong>in</strong>formation is lost.<br />

Another <strong>in</strong>terest<strong>in</strong>g prospect is convert<strong>in</strong>g data from other graph-<strong>based</strong><br />

formats <strong>in</strong>to . We therefore used the SaltNPepper framework [15], a<br />

universal importer to convert a wide range <strong>of</strong> formats <strong>in</strong>to each other. Pepper<br />

is a plug-<strong>in</strong> <strong>based</strong> converter which uses the <strong>in</strong>termediate metamodel Salt to<br />

make a direct conversion between several formats. Us<strong>in</strong>g SaltNPepper and<br />

the API we created a module mapp<strong>in</strong>g to the Salt<br />

metamodel which we plugged <strong>in</strong>to the framework. This step has allowed us<br />

to benefit from all already exist<strong>in</strong>g SaltNPepper modules and to convert data<br />

e.g. from and <strong>in</strong>to the CoNLL (http://ilk.uvt.nl/conll/#dataforma t) format, the<br />

PAULA XML format [16], the GrAF format [17] and more.<br />

The CoNLL format, is a field-<strong>based</strong> format largely used <strong>in</strong> <strong>in</strong>ternational<br />

pars<strong>in</strong>g evaluations. It is dependency-<strong>based</strong>, usually restricted to projective<br />

dependencies, with several fields such as FORM, LEMMA, CPOSTAG,<br />

POSTAG, FEATS, etc. A conversion to is rather straightforward,<br />

with no need for nt elements (see Example 17 and 18).<br />

1 il il CL CLS _ 2 suj _ _<br />

2 mange manger V V _ 0 root _ _<br />

3 une un D DET _ 4 det _ _<br />

4 pomme pomme N NC _ 2 obj _ _<br />

5 . . PONCT PONCT _ 2 ponct _ _<br />

Example 17: CoNLL output for “he eats an apple”<br />

<br />

<br />

<br />

<br />

<br />

Example 18: Fragment <strong>of</strong> a possible representation for CoNLL<br />

When us<strong>in</strong>g the CoNLL format, some problems may arise from the fact that<br />

it does not follow the two level segmentation model <strong>of</strong> MAF (with tokens<br />

and word forms), lead<strong>in</strong>g to compound POS such as P+D for the agglut<strong>in</strong>ate<br />

des (‘de’ + ‘les’ = <strong>of</strong> the).<br />

54

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!