06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

In a second experiment, a dataset was converted from the Passage format [18]<br />

produced by the French parser FRMG [19] 4 <strong>in</strong>to the format. Though<br />

there is no module for SaltNPepper support<strong>in</strong>g the Passage format at present,<br />

a dedicated generator was constructed for this purpose. The Perl<br />

conversion script was easily developed, and, while it rema<strong>in</strong>s to test it on<br />

complex cases, it demonstrates the potential <strong>of</strong> to encode relatively<br />

complex syntactic annotation formats.<br />

The Passage format was developed for the French evaluation campaign also<br />

called Passage (http://atoll.<strong>in</strong>ria.fr/passage/). Accord<strong>in</strong>gly, the Passage format<br />

is currently produced by several French parsers, and is relatively rich <strong>in</strong><br />

<strong>in</strong>formation. It is a successor <strong>of</strong> the EASy format, extended to better conform<br />

the recommendations <strong>of</strong> ISO TC37SC4, <strong>in</strong> particular with the MAF format<br />

(at the tokenization level with T and W elements) and with the use <strong>of</strong> ISO<br />

data categories. Passage is both constituency-<strong>based</strong>, with 6 k<strong>in</strong>ds <strong>of</strong> chunks<br />

(called G) and dependency-<strong>based</strong>, with 14 k<strong>in</strong>ds <strong>of</strong> relations anchored by<br />

either word forms W or chunks G. The T and W elements are<br />

straightforwardly converted <strong>in</strong>to MAF elements and a correspondence (with<br />

attribute @corresp) is established at the level <strong>of</strong> elements t towards<br />

the MAF word forms. The chunks become nt elements and the relations<br />

become edge elements either with<strong>in</strong> t or nt elements. However, strictly<br />

speak<strong>in</strong>g, Passage relations are not oriented (s<strong>in</strong>ce they entail no explicit<br />

notion <strong>of</strong> governor and governee), but role<strong>based</strong>. Furthermore, the COORD<br />

relation (for the coord<strong>in</strong>ations) is ternary and has 3 roles, namely<br />

coordonnant (coord<strong>in</strong>ator), coord-g (left coord<strong>in</strong>ated) and coord-d (right<br />

coord<strong>in</strong>ated). To fit <strong>in</strong> the metamodel, it was therefore needed to<br />

orient the relations (by choos<strong>in</strong>g a governor and orient<strong>in</strong>g the edge from the<br />

governee to its governor) and b<strong>in</strong>arize the COORD relation (us<strong>in</strong>g the<br />

coord<strong>in</strong>ator as governor). Fortunately, no <strong>in</strong>formation is lost <strong>in</strong> the process.<br />

This is achieved us<strong>in</strong>g a @label attribute on edge, which comb<strong>in</strong>es the name<br />

<strong>of</strong> the relation with a role name (such as the SUJ-V_sujet label for the<br />

subject <strong>in</strong> a SUJ-V relation). Constituency is represented by edges labelled<br />

comp with<strong>in</strong> nt elements. F<strong>in</strong>ally, the additional <strong>in</strong>formation carried by<br />

Passage W elements, such as form, lemma and mstag are moved to the<br />

MAF wordForm elements, with a conversion <strong>of</strong> the mstag flat feature<br />

structure (us<strong>in</strong>g tags) <strong>in</strong>to a deep expanded feature structure (<strong>based</strong> on the<br />

FSR standard). Example 21 shows the orig<strong>in</strong>al Passage fragment and<br />

Example 19 and 20 the correspond<strong>in</strong>g representation <strong>in</strong> MAF and .<br />

4<br />

FRMG may be tried on l<strong>in</strong>e at http://alpage.<strong>in</strong>ria.fr/parserdemo with the possible<br />

production <strong>of</strong> 3 formats (DepXML,Passage, CoNLL). The format should be<br />

soon added, thanks to the script presented <strong>in</strong> this paper.<br />

55

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!