A Treebank-based Investigation of IPP-triggering Verbs in Dutch

More documents

Recommendations

Info

oth parsing algorithm and feature setting. For the task of algorithm and feature selection, we used MaltOptimizer [2] as a starting point, whose output we have evaluated and further improved 8 . 1.3 The data Our data were taken from the AGDT and converted to the CoNLL format 9 . The texts taken from the AGDT were organized by author and genre, in order to evaluate how MaltParser performs with varying authors and genres. It is well known [11] that non-projectivity crucially affects the efficiency of dependency parsers. In comparison with the treebanks used in CoNLL-X [4, 155, tab. 1] and CoNLL 2007 shared tasks [9, 920, tab. 1], our data show a remarkably higher rate of non-projective arcs. The subsets that we used, along with the number and percentage of non-projective arcs, are resumed in table 2 10 . Data set Works/Authors Sentences Tokens Non-proj. arcs % Homer Il., Od. 15175 232569 62013 26.66 Tragedy Aesch., Soph. 7897 95363 21747 22.80 Sophocles Soph. 3873 47205 10456 22.15 Table 2: Data sets Each of the subsets in table 2 was randomly partitioned into 5 training and testing sets, all in a ratio of approximately 9:1, in order to perform 5 different experiments. We first focused on the test sets of Homer (table 3) and Sophocles (table 4). Then, we studied how genre and author affect MaltParser in detail. We used a model trained on the Homeric poems to evaluate 3 different subsets: (a) the whole annotated work of Hesiod (18,881 tokens: same genre as the training set, but different author), (b) a sample from Sophocles of roughly the same size as Hesiod (18,418 tokens: different genre and author), and (c) the whole available Plato (6,091 tokens: different genre, different author, prose text). 8 In all our experiments, we used LIBSVN as learning algorithm for MaltParser. 9 The CoNLL format includes the following 10 fields, although only the first 8 contain nondummy values: ID (token counter), FORM, LEMMA, CPOSTAG (coarse-grained PoS), POSTAG (fine-grained PoS), FEATS (unordered set of morphological features), HEAD (head of current token, i.e. a value of ID), DEPREL (dependency relation to the HEAD); http://nextens.uvt.nl/ depparse-wiki/DataFormat 10 Since not all of the tragedies of Sophocles were already published at the time when we started our work, we used an unfinished version of the Oedipus King: 1361 tokens from the ca. last 100 lines of the play were unannotated. 136
Set Name Sentences Tokens % Train/Test Homer_Test1 1686 25898 11.14 Homer_Test2 1562 23258 10.00 Homer_Test3 1469 23267 10.00 Homer_Test4 1500 23274 10.01 Homer_Test5 1486 23259 10.00 Table 3: Homer: Test sets Set Name Sentences Tokens %Train/Test Sophocles_Test1 430 5186 10.99 Sophocles_Test2 389 4725 10.01 Sophocles_Test3 389 4726 10.01 Sophocles_Test4 384 4731 10.02 Sophocles_Test5 386 4721 10.00 Table 4: Sophocles: Test sets 2 Results and evaluation 2.1 Algorithm and feature selection Not surprisingly, given the above reported non-projective rates in our data sets, MaltParser scores rather poorly with the default algorithm (Nivre, a linear-time algorithm limited to projective dependency structures) and model (Arceager). With this configuration, training and testing the parser on the Homeric poems (tab. 3), we attained the following results (baseline): 44.1% LAS, 60.3% UAS, 49.2% LA [4] 11 . By applying the options suggested by MaltOptimizer, we increased the accuracy of the parser considerably. Due to the high number of non-projective trees and of nodes attached to the root, MaltOptimizer recommends the adoption of the following options 12 : • adoption of a non-projective algorithm: Covington non-projective is suggested; • use of the label “AuxK” (terminal punctuation) as default for unattached 11 The metrics used are the following: Labeled Attachment Score (LAS): the percentage of tokens with correct head and relation label; Unlabeled Attachment Score (UAS): the percentage of tokens with the correct head; Label Accuracy (LA): the percentage of tokens with the correct relation label. 12 For a quick introduction to MaltParser optimization, see [8]; for a detailed explanation of each option see [1]. 137
Page 1 and 2:
A Treebank-based Investigation of I
Page 3 and 4:
participle and a (te-)infinitival c
Page 5 and 6:
Some verbs occur twice, since they
Page 7 and 8:
Profiling Feature Selection for Nam
Page 9 and 10:
prepositional objects (FOPP, OPP).
Page 11 and 12:
the limited size of annotated data
Page 13 and 14:
with high precision typically have
Page 15 and 16:
‘This was “not significantly”
Page 17 and 18:
The preposition durch typically has
Page 19 and 20:
Non-Projective Structures in Indian
Page 21 and 22:
the sequential order of nodes in a
Page 23 and 24:
extra-posed relative clause that ge
Page 25 and 26:
Experiments on Dependency Parsing o
Page 27 and 28:
for mitigating the effects of spars
Page 29 and 30:
obtained with MALTParser is 76.61%
Page 31 and 32:
as a standardised serialisation for
Page 33 and 34:
constituency and dependency, possib
Page 35 and 36:
SynAF and/or in . However, they sha
Page 37 and 38:
shows how some elements that are no
Page 39 and 40:
Example
Page 41 and 42:
‏ Example 8: represent
Page 43 and 44:

Page 45 and 46:
Chinese) as the direct object NP.
Page 47 and 48:
Example 15: Tokens and Word Forms
Page 49 and 50:
In a second experiment, a dataset w
Page 51 and 52:

Page 53 and 54:
[3] Leech G. N., Barnett, R. & Kahr
Page 55 and 56:
Effectively long-distance dependenc
Page 57 and 58:
In French, another case of eLDD is
Page 59 and 60:
elativization, it-clefts or questio
Page 61 and 62:
4.2.3 Annotation methodology Becaus
Page 63 and 64:
Number of occurrences in FTB +SEQTB
Page 65 and 66:
producing treebank based LFG approx
Page 67 and 68:
Logical Form Representation for Lin
Page 69 and 70:
gerundives for a total amount of so
Page 71 and 72:
object+indirect object/object one.
Page 73 and 74:
(VP (VB patch) ) ) ) (VP (VBZ is) (
Page 75 and 76:
Types Adverb. Adject. Verbs Nouns T
Page 77 and 78:
Eventually we may comment that ther
Page 79 and 80: DeepBank: A Dynamically Annotated T
Page 81 and 82: from another already existing one,
Page 83 and 84: to parse now does. The extra manual
Page 85 and 86: In the derivation show in Figure 1,
Page 87 and 88: 4 Patching Coverage Gaps with An Ap
Page 89 and 90: will enable further improvements in
Page 91 and 92: ParDeepBank: Multiple Parallel Deep
Page 93 and 94: 2 The ParDeepBank The PTB has emerg
Page 95 and 96: undergone extensive scientific scru
Page 97 and 98: the second combines the structures
Page 99 and 100: Sofia University). Each sentence wa
Page 101 and 102: data, and improvements in the infra
Page 103 and 104: The Effect of Treebank Annotation G
Page 105 and 106: ut without feature structures. This
Page 107 and 108: only the lexicon is fine-grained to
Page 109 and 110: Automatic Coreference Annotation in
Page 111 and 112: manually annotated Czech sentences.
Page 113 and 114: citizens of Bilbao] are very involv
Page 115 and 116: 4.1.3 Coreference Selector Module T
Page 117 and 118: Nominal P R F1 MUC 75.33% 81.33% 78
Page 119 and 120: [9] G. Doddington, A. Mitchell, M.
Page 121 and 122: Analyzing the Most Common Errors in
Page 123 and 124: Graph 1 shows results of subsequent
Page 125 and 126: in the semantic type). This situati
Page 127 and 128: Will a Parser Overtake Achilles? Fi
Page 129: annotation is also based on a depen
Page 133 and 134: dency relations (PRED, OBJ, SBJ, AD
Page 135 and 136: ton and Stacklazy), we trained a mo
Page 137 and 138: Feature Function Column Name Addres
Page 139 and 140: Using Parallel Treebanks for Machin
Page 141 and 142: phrases are generated by a shallow
Page 143 and 144: Figure 2: A Kybot p
Page 145 and 146: SUJ NP SENT MOD PP NP PP PP NP NP P
Page 147 and 148: Checkpoint Instances Google PT Our
Page 149 and 150: 6 Conclusions In this paper we have
Page 151 and 152: An integrated web-based treebank an
Page 153 and 154: 16] and we have earlier reported fr
Page 155 and 156: Figure 2: Screenshot of the interfa
Page 157 and 158: ple subcategorization frames may be
Page 159 and 160: 4 Integrated interface for annotati
Page 161 and 162: of the 5th International Conference
Page 163 and 164: Translational divergences and their
Page 165 and 166: posed so far, Cyrus’ work did not
Page 167 and 168: allowed an appropriate coverage of
Page 169 and 170: damentales [...] 10 (the universal
Page 171 and 172: for all lex_pair(s,t) do if head an
Page 173 and 174: the treatment of typical translatio
Page 175 and 176: Building a treebank of noisy user-g
Page 177 and 178: Phenomenon Attested example Std. co
Page 179 and 180: Figure 1: French Social Media Bank
Page 181 and 182:
Impact of treebank characteristics
Page 183 and 184:
Det Adj N (a) Danish Det Adj N (b)
Page 185 and 186:
sv: Bestämmelserna i detta avtal f
Page 187 and 188:
100 90 80 Unlabelled attachment 70
Page 189 and 190:
vidner , tilhørere og tiltalte wit
Page 191 and 192:
mance is not inconsiderable, as was
Page 193 and 194:
Genitives in Hindi Treebank: An Att
Page 195 and 196:
A genitive can occur with complex p
Page 197 and 198:
quite high. This motivates us towar
show all

A Treebank-based Investigation of IPP-triggering Verbs in Dutch

Create successful ePaper yourself

Delete template?

Save as template?