[15] Ferreira, E., Balsa, J., and Branco, A. 2007. Comb<strong>in</strong><strong>in</strong>g Rule-Based and Statistical Methods for Named Entity Recognition <strong>in</strong> Portuguese. In Proc. <strong>of</strong> the 5th Workshop em Tecnologia da Informação e da L<strong>in</strong>guagem Humana, pp. 1615–1624. [16] Flick<strong>in</strong>ger, D. 2002. On Build<strong>in</strong>g a More Efficient Grammar by Exploit<strong>in</strong>g Types. In Collaborative Language Eng<strong>in</strong>eer<strong>in</strong>g, pp. 1–17. CSLI Publications. [17] Flick<strong>in</strong>ger, D. 2011. Accuracy vs. Robustness <strong>in</strong> Grammar Eng<strong>in</strong>eer<strong>in</strong>g. In Language from a Cognitive Perspective: Grammar, Usage, and Process<strong>in</strong>g, pp. 31–50. CSLI Publications. [18] Georgiev, G., Zhikov, V., Osenova, P., Simov, K., and Nakov, P. 2012. Feature-Rich Part-Of-Speech Tagg<strong>in</strong>g for Morphologically Complex Languages: Application to Bulgarian. In EACL 2012. [19] Klyueva, N., and Mareček, D. 2010. Towards parallel czech-russian dependency treebank. In Proc. <strong>of</strong> the Workshop on Annotation and Exploitation <strong>of</strong> Parallel Corpora. [20] Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, E., Kübler, S., Mar<strong>in</strong>ov, S., and Marsi, E. 2007. MaltParser: A Language-Independent System for Data-Driven Dependency Pars<strong>in</strong>g. In Natural Language Eng<strong>in</strong>eer<strong>in</strong>g, 13(2), pp. 95–135. [21] Oepen, S. 1999. [<strong>in</strong>cr tsdb()] - Competence and Performance Laboratory. Saarland University. [22] Oepen, S., Toutanova, K., Shieber, S., Mann<strong>in</strong>g, C., Flick<strong>in</strong>ger, D., and Brants, T. 2002. The L<strong>in</strong>GO Redwoods <strong>Treebank</strong>: Motivation and Prelim<strong>in</strong>ary Applications. In Proc. <strong>of</strong> COLING’02, pp. 1–5. [23] Osenova, P. 2010. The Bulgarian Resource Grammar. VDM. [24] Pollard, C., and Sag, I. 1994. Head-Driven Phrase Structure Grammar. Studies <strong>in</strong> Contemporary L<strong>in</strong>guistics. University <strong>of</strong> Chicago Press. [25] Prasad, R., D<strong>in</strong>esh, N., Lee, A., Miltsakaki E., Robaldo, L., Joshi, A., and Webber, B. 2008. The Penn Discourse <strong>Treebank</strong> 2.0. In In Proc. <strong>of</strong> LREC’08. [26] Siegel, M., and Bender, E. 2002. Efficient Deep Process<strong>in</strong>g <strong>of</strong> Japanese. In Proc. <strong>of</strong> COLING’02. [27] Simov, K., and Osenova, P. 2011. Towards M<strong>in</strong>imal Recursion Semantics over Bulgarian Dependency Pars<strong>in</strong>g. In Proc. <strong>of</strong> the RANLP 2011. [28] Simov, K., and Osenova, O. 2012. Bulgarian-English <strong>Treebank</strong>: Design and Implementation. In Proc. TLT10. [29] Simov, K., Osenova, P., Laskova, L., Kancheva, S., Savkov, A., and Wang, R. 2012. HPSG-Based Bulgarian-English Statistical Mach<strong>in</strong>e Translation. Littera et L<strong>in</strong>gua. [30] Simov, K., Osenova, P., Laskova, L., Savkov, A., and Kancheva, S. 2011. Bulgarian- English Parallel <strong>Treebank</strong>: Word and Semantic Level Alignment. In Proc. <strong>of</strong> The Second Workshop on Annotation and Exploitation <strong>of</strong> Parallel Corpora, pp. 29–38. [31] Tiedemann, J., and Kotzé, G. 2009. Build<strong>in</strong>g a Large Mach<strong>in</strong>e-Aligned Parallel <strong>Treebank</strong>. In Proc. <strong>of</strong> TLT08, pp. 197–208. [32] Volk, M., Göhr<strong>in</strong>g, A., Marek, T., and Samuelsson, Y. 2010. SMULTRON (version 3.0) — The Stockholm MULtil<strong>in</strong>gual Parallel TReebank. [33] Zhang, Y., Wang, R., and Oepen S. 2009. Hybrid Multil<strong>in</strong>gual Pars<strong>in</strong>g with HPSG for SRL. In Proc. <strong>of</strong> CoNLL 2009. 108
The Effect <strong>of</strong> <strong>Treebank</strong> Annotation Granularity on Pars<strong>in</strong>g: A Comparative Study Masood Ghayoomi Freie Universität Berl<strong>in</strong> E-mail: masood.ghayoomi@fu-berl<strong>in</strong>.de Omid Moradiannasab Iran University <strong>of</strong> Science and Technology E-mail: omidmoradiannasab@gmail.com Abstract Statistical parsers need annotated data for tra<strong>in</strong><strong>in</strong>g. Depend<strong>in</strong>g on the available l<strong>in</strong>guistic <strong>in</strong>formation <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g data, the performance <strong>of</strong> the parsers vary. In this paper, we study the effect <strong>of</strong> annotation granularity on pars<strong>in</strong>g from three po<strong>in</strong>ts <strong>of</strong> views: lexicon, part-<strong>of</strong>-speech tag, and phrase structure. The results show that chang<strong>in</strong>g annotation granularity at each <strong>of</strong> these dimensions has a significant impact on pars<strong>in</strong>g performance. 1 Introduction Pars<strong>in</strong>g is one <strong>of</strong> the ma<strong>in</strong> tasks <strong>in</strong> Natural Language Process<strong>in</strong>g (NLP). The state<strong>of</strong>-the-art statistical parsers are tra<strong>in</strong>ed with treebanks [4, 5], ma<strong>in</strong>ly developed <strong>based</strong> on the Phrase Structure Grammar (PSG). The part-<strong>of</strong>-speech (POS) tags <strong>of</strong> the words <strong>in</strong> the treebanks are def<strong>in</strong>ed accord<strong>in</strong>g to a tag set which conta<strong>in</strong>s the syntactic categories <strong>of</strong> the words with the optional additional morpho-syntactic <strong>in</strong>formation. Moreover, the constituent labels <strong>in</strong> treebanks might also be enriched with syntactic functions. The developed annotated data <strong>in</strong> the framework <strong>of</strong> deeper formalisms such as HPSG [13] has provided a f<strong>in</strong>e representation <strong>of</strong> l<strong>in</strong>guistic knowledge. The performance <strong>of</strong> the parsers tra<strong>in</strong>ed with the latter data set have not beaten the state-<strong>of</strong>-the-art results [12] which shows that f<strong>in</strong>e-gra<strong>in</strong>ed representation <strong>of</strong> l<strong>in</strong>guistic knowledge adds complexities to a parser and it has a counter-effect on the performance <strong>of</strong> the parser. In this paper, we aim to comprehensively study the effect <strong>of</strong> annotation granularity on pars<strong>in</strong>g from three po<strong>in</strong>ts <strong>of</strong> views: lexicon, POS tag, and phrase structure. This study has a different perspective than Rehbe<strong>in</strong> and van Genabith [14]. We selected Persian, a language from the Indo-European language family, as the target language <strong>of</strong> our study. 2 <strong>Treebank</strong> Annotation Dimensions Lexicon: The words <strong>of</strong> a language represent f<strong>in</strong>e-gra<strong>in</strong>ed concepts, and the l<strong>in</strong>guistic <strong>in</strong>formation added to the words plays a very important role for lexicalized, 109
- Page 1 and 2:
A Treebank-based Investigation of I
- Page 3 and 4:
participle and a (te-)infinitival c
- Page 5 and 6:
Some verbs occur twice, since they
- Page 7 and 8:
Profiling Feature Selection for Nam
- Page 9 and 10:
prepositional objects (FOPP, OPP).
- Page 11 and 12:
the limited size of annotated data
- Page 13 and 14:
with high precision typically have
- Page 15 and 16:
‘This was “not significantly”
- Page 17 and 18:
The preposition durch typically has
- Page 19 and 20:
Non-Projective Structures in Indian
- Page 21 and 22:
the sequential order of nodes in a
- Page 23 and 24:
extra-posed relative clause that ge
- Page 25 and 26:
Experiments on Dependency Parsing o
- Page 27 and 28:
for mitigating the effects of spars
- Page 29 and 30:
obtained with MALTParser is 76.61%
- Page 31 and 32:
as a standardised serialisation for
- Page 33 and 34:
constituency and dependency, possib
- Page 35 and 36:
SynAF and/or in . However, they sha
- Page 37 and 38:
shows how some elements that are no
- Page 39 and 40:
Example
- Page 41 and 42:
Example 8: represent
- Page 43 and 44:
- Page 45 and 46:
Chinese) as the direct object NP.
- Page 47 and 48:
Example 15: Tokens and Word Forms
- Page 49 and 50:
In a second experiment, a dataset w
- Page 51 and 52:
- Page 53 and 54: [3] Leech G. N., Barnett, R. & Kahr
- Page 55 and 56: Effectively long-distance dependenc
- Page 57 and 58: In French, another case of eLDD is
- Page 59 and 60: elativization, it-clefts or questio
- Page 61 and 62: 4.2.3 Annotation methodology Becaus
- Page 63 and 64: Number of occurrences in FTB +SEQTB
- Page 65 and 66: producing treebank based LFG approx
- Page 67 and 68: Logical Form Representation for Lin
- Page 69 and 70: gerundives for a total amount of so
- Page 71 and 72: object+indirect object/object one.
- Page 73 and 74: (VP (VB patch) ) ) ) (VP (VBZ is) (
- Page 75 and 76: Types Adverb. Adject. Verbs Nouns T
- Page 77 and 78: Eventually we may comment that ther
- Page 79 and 80: DeepBank: A Dynamically Annotated T
- Page 81 and 82: from another already existing one,
- Page 83 and 84: to parse now does. The extra manual
- Page 85 and 86: In the derivation show in Figure 1,
- Page 87 and 88: 4 Patching Coverage Gaps with An Ap
- Page 89 and 90: will enable further improvements in
- Page 91 and 92: ParDeepBank: Multiple Parallel Deep
- Page 93 and 94: 2 The ParDeepBank The PTB has emerg
- Page 95 and 96: undergone extensive scientific scru
- Page 97 and 98: the second combines the structures
- Page 99 and 100: Sofia University). Each sentence wa
- Page 101: data, and improvements in the infra
- Page 105 and 106: ut without feature structures. This
- Page 107 and 108: only the lexicon is fine-grained to
- Page 109 and 110: Automatic Coreference Annotation in
- Page 111 and 112: manually annotated Czech sentences.
- Page 113 and 114: citizens of Bilbao] are very involv
- Page 115 and 116: 4.1.3 Coreference Selector Module T
- Page 117 and 118: Nominal P R F1 MUC 75.33% 81.33% 78
- Page 119 and 120: [9] G. Doddington, A. Mitchell, M.
- Page 121 and 122: Analyzing the Most Common Errors in
- Page 123 and 124: Graph 1 shows results of subsequent
- Page 125 and 126: in the semantic type). This situati
- Page 127 and 128: Will a Parser Overtake Achilles? Fi
- Page 129 and 130: annotation is also based on a depen
- Page 131 and 132: Set Name Sentences Tokens % Train/T
- Page 133 and 134: dency relations (PRED, OBJ, SBJ, AD
- Page 135 and 136: ton and Stacklazy), we trained a mo
- Page 137 and 138: Feature Function Column Name Addres
- Page 139 and 140: Using Parallel Treebanks for Machin
- Page 141 and 142: phrases are generated by a shallow
- Page 143 and 144: Figure 2: A Kybot p
- Page 145 and 146: SUJ NP SENT MOD PP NP PP PP NP NP P
- Page 147 and 148: Checkpoint Instances Google PT Our
- Page 149 and 150: 6 Conclusions In this paper we have
- Page 151 and 152: An integrated web-based treebank an
- Page 153 and 154:
16] and we have earlier reported fr
- Page 155 and 156:
Figure 2: Screenshot of the interfa
- Page 157 and 158:
ple subcategorization frames may be
- Page 159 and 160:
4 Integrated interface for annotati
- Page 161 and 162:
of the 5th International Conference
- Page 163 and 164:
Translational divergences and their
- Page 165 and 166:
posed so far, Cyrus’ work did not
- Page 167 and 168:
allowed an appropriate coverage of
- Page 169 and 170:
damentales [...] 10 (the universal
- Page 171 and 172:
for all lex_pair(s,t) do if head an
- Page 173 and 174:
the treatment of typical translatio
- Page 175 and 176:
Building a treebank of noisy user-g
- Page 177 and 178:
Phenomenon Attested example Std. co
- Page 179 and 180:
Figure 1: French Social Media Bank
- Page 181 and 182:
Impact of treebank characteristics
- Page 183 and 184:
Det Adj N (a) Danish Det Adj N (b)
- Page 185 and 186:
sv: Bestämmelserna i detta avtal f
- Page 187 and 188:
100 90 80 Unlabelled attachment 70
- Page 189 and 190:
vidner , tilhørere og tiltalte wit
- Page 191 and 192:
mance is not inconsiderable, as was
- Page 193 and 194:
Genitives in Hindi Treebank: An Att
- Page 195 and 196:
A genitive can occur with complex p
- Page 197 and 198:
quite high. This motivates us towar