A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
A Treebank-based Investigation of IPP-triggering Verbs in Dutch
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Effectively long-distance dependencies <strong>in</strong> French :<br />
annotation and pars<strong>in</strong>g evaluation<br />
Marie Candito ⋆ and Djamé Seddah ⋆⋄<br />
⋆ Alpage (Univ. Paris Diderot & INRIA), 175 rue du Chevaleret, 75013 Paris, France<br />
⋄ Univ. Paris Sorbonne, 28, rue Serpente, 75006 Paris, France<br />
marie.candito@l<strong>in</strong>guist.jussieu.fr, djame.seddah@paris-sorbonne.fr<br />
Abstract<br />
We describe the annotation <strong>of</strong> cases <strong>of</strong> extraction <strong>in</strong> French, whose previous<br />
annotations <strong>in</strong> the available French treebanks were <strong>in</strong>sufficient to recover the<br />
correct predicate-argument dependency between the extracted element and<br />
its head. These cases are special cases <strong>of</strong> LDDs, that we call effectively longdistance<br />
dependencies (eLDDs), <strong>in</strong> which the extracted element is <strong>in</strong>deed<br />
separated from its head by one or more <strong>in</strong>terven<strong>in</strong>g heads (<strong>in</strong>stead <strong>of</strong> zero,<br />
one or more for the general case). We found that extraction <strong>of</strong> adependent<br />
<strong>of</strong> a f<strong>in</strong>ite verb is very rarely an eLDD (one case out <strong>of</strong> 420 000 tokens),<br />
but eLDDs correspond<strong>in</strong>g to extraction out <strong>of</strong> <strong>in</strong>f<strong>in</strong>itival phrase is more frequent<br />
(one third <strong>of</strong> all occurrences <strong>of</strong> accusative relative pronoun que), and<br />
eLDDs with extraction out <strong>of</strong> NPs are quite common (2/3<strong>of</strong>theoccurrences<br />
<strong>of</strong> relative pronoun dont). We also use the annotated data <strong>in</strong> statistical dependency<br />
pars<strong>in</strong>g experiments, and compare several pars<strong>in</strong>g architectures able<br />
to recover non-local governors for extracted elements.<br />
1 Introduction<br />
While statistical parsers obta<strong>in</strong> high overall performance, they exhibit very different<br />
performance across l<strong>in</strong>guistic phenomena. In particular, most statistical parsers<br />
perform poorly on long-distance dependencies (LDDs), which, though rare,are<br />
important to fully recover predicate-argument structures, which are <strong>in</strong> turn needed<br />
for semantic applications <strong>of</strong> pars<strong>in</strong>g. Poor performance on LDDs is known <strong>of</strong><br />
English statistical parsers, even though the tra<strong>in</strong><strong>in</strong>g data does conta<strong>in</strong> <strong>in</strong>formation<br />
for resolv<strong>in</strong>g unbounded dependencies (the Penn <strong>Treebank</strong>, or the specific dataset<br />
evaluated by Rimell et al. [17]). For French, the situation is worse, s<strong>in</strong>ce the usual<br />
tra<strong>in</strong><strong>in</strong>g data, the French <strong>Treebank</strong> (Abeillé and Barrier [1]), is a surface syntagmatic<br />
treebank that does not conta<strong>in</strong> <strong>in</strong>dications <strong>of</strong> LDDs : extracted elements bear<br />
a grammatical function, but no annotation <strong>in</strong>dicates their embedded head. Hence<br />
syntagmatic stochastic French parsers cannot capture LDDs. Concern<strong>in</strong>g dependency<br />
pars<strong>in</strong>g, French dependency parsers can be learnt on the DEPFTB, result<strong>in</strong>g<br />
61