06.07.2014 Views

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Effectively long-distance dependencies <strong>in</strong> French :<br />

annotation and pars<strong>in</strong>g evaluation<br />

Marie Candito ⋆ and Djamé Seddah ⋆⋄<br />

⋆ Alpage (Univ. Paris Diderot & INRIA), 175 rue du Chevaleret, 75013 Paris, France<br />

⋄ Univ. Paris Sorbonne, 28, rue Serpente, 75006 Paris, France<br />

marie.candito@l<strong>in</strong>guist.jussieu.fr, djame.seddah@paris-sorbonne.fr<br />

Abstract<br />

We describe the annotation <strong>of</strong> cases <strong>of</strong> extraction <strong>in</strong> French, whose previous<br />

annotations <strong>in</strong> the available French treebanks were <strong>in</strong>sufficient to recover the<br />

correct predicate-argument dependency between the extracted element and<br />

its head. These cases are special cases <strong>of</strong> LDDs, that we call effectively longdistance<br />

dependencies (eLDDs), <strong>in</strong> which the extracted element is <strong>in</strong>deed<br />

separated from its head by one or more <strong>in</strong>terven<strong>in</strong>g heads (<strong>in</strong>stead <strong>of</strong> zero,<br />

one or more for the general case). We found that extraction <strong>of</strong> adependent<br />

<strong>of</strong> a f<strong>in</strong>ite verb is very rarely an eLDD (one case out <strong>of</strong> 420 000 tokens),<br />

but eLDDs correspond<strong>in</strong>g to extraction out <strong>of</strong> <strong>in</strong>f<strong>in</strong>itival phrase is more frequent<br />

(one third <strong>of</strong> all occurrences <strong>of</strong> accusative relative pronoun que), and<br />

eLDDs with extraction out <strong>of</strong> NPs are quite common (2/3<strong>of</strong>theoccurrences<br />

<strong>of</strong> relative pronoun dont). We also use the annotated data <strong>in</strong> statistical dependency<br />

pars<strong>in</strong>g experiments, and compare several pars<strong>in</strong>g architectures able<br />

to recover non-local governors for extracted elements.<br />

1 Introduction<br />

While statistical parsers obta<strong>in</strong> high overall performance, they exhibit very different<br />

performance across l<strong>in</strong>guistic phenomena. In particular, most statistical parsers<br />

perform poorly on long-distance dependencies (LDDs), which, though rare,are<br />

important to fully recover predicate-argument structures, which are <strong>in</strong> turn needed<br />

for semantic applications <strong>of</strong> pars<strong>in</strong>g. Poor performance on LDDs is known <strong>of</strong><br />

English statistical parsers, even though the tra<strong>in</strong><strong>in</strong>g data does conta<strong>in</strong> <strong>in</strong>formation<br />

for resolv<strong>in</strong>g unbounded dependencies (the Penn <strong>Treebank</strong>, or the specific dataset<br />

evaluated by Rimell et al. [17]). For French, the situation is worse, s<strong>in</strong>ce the usual<br />

tra<strong>in</strong><strong>in</strong>g data, the French <strong>Treebank</strong> (Abeillé and Barrier [1]), is a surface syntagmatic<br />

treebank that does not conta<strong>in</strong> <strong>in</strong>dications <strong>of</strong> LDDs : extracted elements bear<br />

a grammatical function, but no annotation <strong>in</strong>dicates their embedded head. Hence<br />

syntagmatic stochastic French parsers cannot capture LDDs. Concern<strong>in</strong>g dependency<br />

pars<strong>in</strong>g, French dependency parsers can be learnt on the DEPFTB, result<strong>in</strong>g<br />

61

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!