A Treebank-based Investigation of IPP-triggering Verbs in Dutch

More documents

Recommendations

Info

general, no dependent of the local verbal head intervenes between the subject and the extracted element (en or dont) as in (6), projectivity is preserved. 8 Number of tokens Treebank total with eLDD (%) fpl=2 fpl=3 fpl>3 non projective FTB 350931 555 (0.16 %) 466 69 20 317 SEQTB 69238 63 (0.09 %) 47 13 3 42 FTB +SEQTB 420169 618 (0.15%) 513 82 23 359 Table 1: For the FTB and the SEQTB, total number of tokens, tokens with non-local dependency, i.e. with length of functional path (fpl) > 1, tokens with fpl=2, fpl=3, fpl > 3; Number of tokens with fpl>1 that have a non projective dependency. If we focus on the top three lexical items that exhibit eLDDs, we obtain the accusative relative pronoun que, the genitive relative pronoun dont and the anaphoric clitic en. As can be seen in table 2, these three lemmas totalize 570 out of the 618 cases of annotated eLDD in the FTB +SEQTB treebanks. 9 Note that though we saw eLDDs are very rare, these particular three lemmas have a substantial proportion of eLDD occurrences, especially dont (65.7% of its occurrences). The most frequent element extracted from a verbal phrase is the relative pronoun que. A striking observation is that for all the 152 eLDD cases involving que, the embedded phrase que originates from is infinitival (as in the example Figure 3), and none is a finite clause. 10 However, though the relative pronoun dont and the clitic en can either depend on verbs, nouns or adjectives, the majority of eLDDs are cases of extraction out of NPs. More precisely, out of subject NPs for dont (251 cases of functional paths DEP.SUJ out of 329) and out of object NPs for en (51 cases of functional paths DEP.OBJ, out of 89). The other prevalent case for clitic en is the extraction out of predicative complement (24 cases of functional paths DEP.ATS). 5 Parsing experiments In this section we list and evaluate various parsing strategies able to output eLDDs. Though the overall parsing performance is unlikely to be affected by this parameter (given the very small amount of annotated eLDDs), it seems interesting to focus on attachment scores for the specific tokens that do often trigger eLDDs. We relate experiments both using the “local” dependency trees, namely trees automatically converted from constituency trees without functional paths, and us- 8 Non projectivity also arises when extracting a more deeply embedded dependent within the subject NP, as in “une entreprise dont la moyenne d’âge des salariés dépasse 40 ans” (a company of-which the average of age of the employees is over 40). 9 Other cases concern (i) extractions of prepositional phrases (pied-piping), such as in example 3, for which the token that will bear the eLDD is the head preposition of the PP (à in example 3) or (ii) rare cases of extraction of NPs with wh-determiner, for which the noun bears the eLDD. 10 We found no extraction out of an embedded finite clause in the FTB, and one in the SEQTB, in which the extracted element is a PP. 68
Number of occurrences in FTB +SEQTB Lemma POS total local gov non-local gov Top-3 most frequent fct paths que relative 616 464 152 (32,8%) OBJ.OBJ 77 pronoun OBJ.OBJ.OBJ 22 OBJ.OBJ.DE_OBJ 19 dont relative 501 172 329 (65.7%) DEP.SUJ 251 pronoun DEP.OBJ 29 DEP.ATS 27 en accusative 411 322 89 (21,7%) DEP.OBJ 51 clitic DEP.ATS 24 DEP.SUJ 11 Table 2: Statistics for the three lexical items having a non-local governor most frequently: total number of occurrences, number with and without local governor, and top-three most frequently-annotated functional paths. ing the “non local” dependency trees, which have corrected dependencies for eL- DDs (obtained via the procedure described in section 3.1). The local trees are projective, whereas the non local trees contain a few non projective links (two thirds of the eLDDs are non projective, cf. section 4.3). We use the usual split for the 2007 version of the FTB (1235, 1235 and 9881 sentences for test, development and training sets respectively). For evaluation, we use as test sets the whole SEQTB on top of the usual FTB development and test sets. 11 In all our experiments the predicted parses are evaluated against the non local (pseudo-)gold dependency trees (while for training, we sometimes use the local dependency trees). We provide unlabeled and labeled attachment scores for all tokens except punctuation, and for the three lexical items that have a non local governor most frequently (the three lemma+POS pairs of Table 2). We use MaltParser (Nivre et al. [14]), version 1.7 and MSTParser (McDonald [11]), version 0.5.0. 12 For the features, we use the best ones from a benchmarking of dependency parsers for French (Candito et al. [4]), except we removed unsupervised word clusters as features. 13 For MaltParser, we always use the arc-eager parsing algorithm, that can provide projective trees only. 14 For MSTParser, we either use order 1 factors and the Chu-Liu-Edmonds algorithm, that can output non projective trees, or order 2 factors, with the Eisner algorithm, restricted to projective trees. We also test pseudo-projective parsing (Nivre and Nilsson [15]), where non projective trees are transformed into projective ones, with label marking to allow the reverse the graph transformation: we use MaltParser to obtain projectivized versions of the non local trees, then either MaltParser or MSTParser to train a (projective) parser on these, and MaltParser again to de-projectivize the obtained 11 As we did no parameter tuning, we did not reserve a test set for final test. 12 Available at http://www.maltparser.org and http://sourceforge.net/projects/mstparser. 13 POS are predicted using MORFETTE (Chrupała et al. [6]), and lemmas and morphological features are predicted using the BONSAI toolkit. 14 Experiments with the swap algorithms showed degraded performance, supposedly due to the feeble amount of non-projective links. 69
Page 1 and 2:
A Treebank-based Investigation of I
Page 3 and 4:
participle and a (te-)infinitival c
Page 5 and 6:
Some verbs occur twice, since they
Page 7 and 8:
Profiling Feature Selection for Nam
Page 9 and 10:
prepositional objects (FOPP, OPP).
Page 11 and 12: the limited size of annotated data
Page 13 and 14: with high precision typically have
Page 15 and 16: ‘This was “not significantly”
Page 17 and 18: The preposition durch typically has
Page 19 and 20: Non-Projective Structures in Indian
Page 21 and 22: the sequential order of nodes in a
Page 23 and 24: extra-posed relative clause that ge
Page 25 and 26: Experiments on Dependency Parsing o
Page 27 and 28: for mitigating the effects of spars
Page 29 and 30: obtained with MALTParser is 76.61%
Page 31 and 32: as a standardised serialisation for
Page 33 and 34: constituency and dependency, possib
Page 35 and 36: SynAF and/or in . However, they sha
Page 37 and 38: shows how some elements that are no
Page 39 and 40: Example
Page 41 and 42: ‏ Example 8: represent
Page 43 and 44:
Page 45 and 46: Chinese) as the direct object NP.
Page 47 and 48: Example 15: Tokens and Word Forms
Page 49 and 50: In a second experiment, a dataset w
Page 51 and 52:
Page 53 and 54: [3] Leech G. N., Barnett, R. & Kahr
Page 55 and 56: Effectively long-distance dependenc
Page 57 and 58: In French, another case of eLDD is
Page 59 and 60: elativization, it-clefts or questio
Page 61: 4.2.3 Annotation methodology Becaus
Page 65 and 66: producing treebank based LFG approx
Page 67 and 68: Logical Form Representation for Lin
Page 69 and 70: gerundives for a total amount of so
Page 71 and 72: object+indirect object/object one.
Page 73 and 74: (VP (VB patch) ) ) ) (VP (VBZ is) (
Page 75 and 76: Types Adverb. Adject. Verbs Nouns T
Page 77 and 78: Eventually we may comment that ther
Page 79 and 80: DeepBank: A Dynamically Annotated T
Page 81 and 82: from another already existing one,
Page 83 and 84: to parse now does. The extra manual
Page 85 and 86: In the derivation show in Figure 1,
Page 87 and 88: 4 Patching Coverage Gaps with An Ap
Page 89 and 90: will enable further improvements in
Page 91 and 92: ParDeepBank: Multiple Parallel Deep
Page 93 and 94: 2 The ParDeepBank The PTB has emerg
Page 95 and 96: undergone extensive scientific scru
Page 97 and 98: the second combines the structures
Page 99 and 100: Sofia University). Each sentence wa
Page 101 and 102: data, and improvements in the infra
Page 103 and 104: The Effect of Treebank Annotation G
Page 105 and 106: ut without feature structures. This
Page 107 and 108: only the lexicon is fine-grained to
Page 109 and 110: Automatic Coreference Annotation in
Page 111 and 112: manually annotated Czech sentences.
Page 113 and 114:
citizens of Bilbao] are very involv
Page 115 and 116:
4.1.3 Coreference Selector Module T
Page 117 and 118:
Nominal P R F1 MUC 75.33% 81.33% 78
Page 119 and 120:
[9] G. Doddington, A. Mitchell, M.
Page 121 and 122:
Analyzing the Most Common Errors in
Page 123 and 124:
Graph 1 shows results of subsequent
Page 125 and 126:
in the semantic type). This situati
Page 127 and 128:
Will a Parser Overtake Achilles? Fi
Page 129 and 130:
annotation is also based on a depen
Page 131 and 132:
Set Name Sentences Tokens % Train/T
Page 133 and 134:
dency relations (PRED, OBJ, SBJ, AD
Page 135 and 136:
ton and Stacklazy), we trained a mo
Page 137 and 138:
Feature Function Column Name Addres
Page 139 and 140:
Using Parallel Treebanks for Machin
Page 141 and 142:
phrases are generated by a shallow
Page 143 and 144:
Figure 2: A Kybot p
Page 145 and 146:
SUJ NP SENT MOD PP NP PP PP NP NP P
Page 147 and 148:
Checkpoint Instances Google PT Our
Page 149 and 150:
6 Conclusions In this paper we have
Page 151 and 152:
An integrated web-based treebank an
Page 153 and 154:
16] and we have earlier reported fr
Page 155 and 156:
Figure 2: Screenshot of the interfa
Page 157 and 158:
ple subcategorization frames may be
Page 159 and 160:
4 Integrated interface for annotati
Page 161 and 162:
of the 5th International Conference
Page 163 and 164:
Translational divergences and their
Page 165 and 166:
posed so far, Cyrus’ work did not
Page 167 and 168:
allowed an appropriate coverage of
Page 169 and 170:
damentales [...] 10 (the universal
Page 171 and 172:
for all lex_pair(s,t) do if head an
Page 173 and 174:
the treatment of typical translatio
Page 175 and 176:
Building a treebank of noisy user-g
Page 177 and 178:
Phenomenon Attested example Std. co
Page 179 and 180:
Figure 1: French Social Media Bank
Page 181 and 182:
Impact of treebank characteristics
Page 183 and 184:
Det Adj N (a) Danish Det Adj N (b)
Page 185 and 186:
sv: Bestämmelserna i detta avtal f
Page 187 and 188:
100 90 80 Unlabelled attachment 70
Page 189 and 190:
vidner , tilhørere og tiltalte wit
Page 191 and 192:
mance is not inconsiderable, as was
Page 193 and 194:
Genitives in Hindi Treebank: An Att
Page 195 and 196:
A genitive can occur with complex p
Page 197 and 198:
quite high. This motivates us towar
show all

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

Create successful ePaper yourself

Delete template?

Save as template?