PhD thesis - School of Informatics - University of Edinburgh

More documents

Recommendations

Info

Chapter 5. Parsing English Inclusions 123 CNP-OA→NP-CJ KON-CD NP-CJ (a) Original coordinated accusative NP rule. CNP-OA→NP-OA KON-CD NP-OA (b) Transformed coordinated accusative NP rule. Figure 5.4: Tree transformation for a coordinated noun phrase rule. development-test split, the parser performs with an accuracy of 73.1 F-score on la- belled brackets with a coverage of 99.1% (Dubey, 2005b). Dubey (2005b) has found that, without affecting coverage, the transformations improve parsing performance by 4 points in F-score over the baseline grammatical function parser which yields an F- score of 69.1 on the NEGRA test set. In addition to the treebank re-annotation, the parser also makes use of suffix analysis, however, beam search or smoothing are not employed. Both beam search and smoothing lead the model to perform better but result in a decrease in coverage and an increase in parsing time by up to 10 times, respectively (Dubey, 2005a). Dubey’s figures are derived on a test set limited to sentences containing 40 tokens or less. In the data sets used in the experiment that are presented in this chapter, however, sentence length is not limited. Moreover, the average sentence length of these test sets is con- siderably higher (28.4 tokens) than that of the NEGRA test set (17.24 tokens). Con- sequently, a slightly lower performance and/or coverage is anticipated, even though the type and domain as well as the annotation of both the NEGRA and the TIGER treebanks are very similar. The minor annotation differences that do exist between NEGRA and TIGER are explained by Brants et al. (2002). 5.3.2 Parser Modifications Several variations of the parser are tested: (1) the baseline parser, (2) the perfect tagging model, (3) the word-by-word model and (4) the inclusion entity model. The baseline parser does not treat foreign inclusions in any special way, i.e. the parser attempts to guess the POS tag of each inclusion token using the same suffix analysis as for rare or unseen German words. The additional versions of the parser are inspired by
Chapter 5. Parsing English Inclusions 124 the hypothesis that inclusions make parsing difficult, and this difficulty arises primarily because the parser cannot detect inclusions. Therefore, an anticipated upper bound is to give the parser perfect tagging information. Two further versions interface with the English inclusion classifier and treat words marked as inclusions differently from na- tive words. The first version does so on a word-by-word basis. Conversely, the second version, the inclusion entity approach, attempts to group inclusions even if a grouping is not posited by phrase structure rules. Each version is now described in detail. 5.3.2.1 Perfect Tagging Model This model involves allowing the parser to make use of perfect tagging information for all tokens given in the pre-terminal nodes. In the TIGER annotation, pre-terminals in- clude not only POS tags and but also grammatical function labels. For example, rather than a pre-terminal node having the category PRELS (personal pronoun), it is given the category PRELS-OA (accusative personal pronoun) in the gold standard annotation. When given the POS tags along with the grammatical functions, the perfect tagging parser may unfairly disambiguate more syntactic information than when simply pro- vided with perfect POS tags alone. Therefore, to make this model more realistic, the parser is required to guess the grammatical functions itself, allowing it to, for example, mistakenly tag an accusative personal pronoun as a nominative, dative or genitive one. This setup gives the parser access to information about the gold standard POS tags of English inclusions along with those of all other words, but does not offer any additional hints about the syntactic structure of the sentence as a whole. 5.3.2.2 Word-by-word Model The two remaining models both take advantage of information acquired from the En- glish inclusion classifier. To interface the classifier with the parser, each inclusion is simply marked with a special FOM (foreign material) tag. The word-by-word parser attempts to guess POS tags itself, much like the baseline. However, whenever it en- counters a FOM tag, it restricts itself to the set of POS tags observed for inclusions during training (the tags listed in Table 5.1). When a FOM is detected, these and only these POS tags are guessed; all other aspects of the parser remain the same.
Page 1 and 2:
Automatic Detection of English Incl
Page 3 and 4:
these parsers with the annotation-f
Page 5 and 6:
Declaration I declare that this the
Page 7 and 8:
3.3.5 Post-processing Module . . .
Page 9 and 10:
A.2.2 Kappa Coefficient . . . . . .
Page 11 and 12:
5.6 Average relative token frequenc
Page 13 and 14:
3.16 Most frequent English inclusio
Page 15 and 16:
Chapter 1. Introduction 2 siderable
Page 17 and 18:
Chapter 1. Introduction 4 Chapter 3
Page 19 and 20:
Chapter 1. Introduction 6 1.1 Relat
Page 21 and 22:
Chapter 2. Background and Theory 8
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Chapter 3 Tracking English Inclusio
Page 61 and 62:
Chapter 3. Tracking English Inclusi
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86: Chapter 3. Tracking English Inclusi
Page 113 and 114: Chapter 4 System Extension to a New
Page 115 and 116: Chapter 4. System Extension to a Ne
Page 129 and 130: Chapter 5 Parsing English Inclusion
Page 131 and 132: Chapter 5. Parsing English Inclusio
Page 135: Chapter 5. Parsing English Inclusio
Page 159 and 160: Chapter 6 Other Potential Applicati
Page 161 and 162: Chapter 6. Other Potential Applicat
Page 187 and 188:
Chapter 7 Conclusions and Future Wo
Page 189 and 190:
Chapter 7. Conclusions and Future W
Page 191 and 192:
Appendix A. Evaluation Metrics and
Page 193 and 194:
Page 195 and 196:
Page 197 and 198:
Page 199 and 200:
Appendix B. Guidelines for Annotati
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
Appendix C TIGER Tags and Labels C.
Page 207 and 208:
Appendix C. TIGER Tags and Labels 1
Page 209 and 210:
Appendix C. TIGER Tags and Labels 1
Page 211 and 212:
Bibliography 198 Andersen, G. (2005
Page 213 and 214:
Bibliography 200 Bresnan, J. (2001)
Page 215 and 216:
Bibliography 202 Damashek, M. (1995
Page 217 and 218:
Bibliography 204 Finkel, J., Dingar
Page 219 and 220:
Bibliography 206 Hachey, B., Alex,
Page 221 and 222:
Bibliography 208 Kirkness, A. (1984
Page 223 and 224:
Bibliography 210 and Technology (In
Page 225 and 226:
Bibliography 212 Poplack, S. (1988)
Page 227 and 228:
Bibliography 214 Sokol, D. K. (2000
Page 229:
Bibliography 216 Yang, W. (1990). A
show all

PhD thesis - School of Informatics - University of Edinburgh

Create successful ePaper yourself

Delete template?

Save as template?