PhD thesis - School of Informatics - University of Edinburgh

More documents

Recommendations

Info

Chapter 5. Parsing English Inclusions 139 5.4 Parsing Experiments with a Hand-crafted Grammar A second set of parsing experiments involve a German parser based on a hand-crafted grammar, using the Lexical Functional Grammar (LFG) formalism, developed at the University of Stuttgart. The nature of parsing German sentences containing English inclusions with this monolingual parser will be analysed in detail. The aim is to de- termine if inclusions pose as much difficulty as they do with a monolingual treebank- induced parser and to test if additional knowledge about this language-mixing phe- nomenon can be exploited to overcome this problem. Considering that the treebank- induced parser sees at least some inclusions in the training data, although they are sparse, a hand-written symbolic parser is expected to have even more difficulty in deal- ing with English inclusions as it generally does not contain rules that handle foreign material. Before focussing on the experiments, the parser is briefly introduced. 5.4.1 Parser The Xerox Linguistic Environment (XLE) is the underlying parsing platform used in the following set of experiments (John T. Maxwell and Kaplan, 1993). This platform functions in conjunction with a hand-written large-scale LFG of German developed by Butt et al. (2002) and improved, for example, by Dipper (2003), Rohrer and Forst (2006) and Forst and Kaplan (2006). The version of the German grammar used here contains 274 LFG style rules compiled into an automaton with 6,584 states and 22,241 arcs. Before parsing, the input is firstly tokenised and normalised. Subsequently, string-based multi-word identification is carried out, followed by morphological analysis, analysis guessing for unknown words and lexically-based multi-word identification (Rohrer and Forst, 2006; Forst and Kaplan, 2006). Forst and Kaplan (2006) improved the parsing coverage for this grammar from 68.3% to 73.4% on sentences 8,001 to 10,000 of the TIGER corpus by revising the integrated tokeniser. The parser outputs Prolog-encoded constituent-structure (c-structure) and functional-structure (f-structure) analyses for each sentence. These two representa- tion levels are fundamental to the linguistic theory of LFG and encode the syntactic properties of sentences. For in-depth introductions to LFG, see Falk (2001), Bresnan (2001), Dalrymple (2001) and Dalrymple et al. (1995). While c-structures represent the word order and phrasal grouping of a sentence in a tree, f-structures encode the
Chapter 5. Parsing English Inclusions 140 f2:NP f3:N Mary f1:S f5:ADV never f4:VP f7:V reads f6:VP ⎡ PRED ⎢ SUBJ ⎢ f1, f4, f6, f7: ⎢ ⎣ ’read < (f1 SUBJ)(f1 OBJ) > ’ ⎡ PRED ⎢ f2, f3: ⎢ ⎣ ′ Mary ′ OBJ ADJUNCT ⎤ ⎤ ⎥ ⎥ ⎥ CASE nom ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ NUM sg ⎥ ⎥ ⎦ ⎥ PERS 3 ⎥ ⎡ ⎤ ⎥ PRED ’book’ ⎥ ⎢ ⎥ ⎥ ⎢ CASE acc ⎥ ⎥ ⎥ ⎥ f8, f9: ⎢ ⎥ ⎥ ⎢ ⎣ NUM pl ⎥ ⎥ ⎦ ⎥ PERS 3 ⎥ ⎥ f5: PRED ’never’ ⎥ ⎦ TENSE present f8:NP f9:N books Figure 5.9: Complete c- and f-structures for an English example sentence (Dipper, 2003).
Page 1 and 2:
Automatic Detection of English Incl
Page 3 and 4:
these parsers with the annotation-f
Page 5 and 6:
Declaration I declare that this the
Page 7 and 8:
3.3.5 Post-processing Module . . .
Page 9 and 10:
A.2.2 Kappa Coefficient . . . . . .
Page 11 and 12:
5.6 Average relative token frequenc
Page 13 and 14:
3.16 Most frequent English inclusio
Page 15 and 16:
Chapter 1. Introduction 2 siderable
Page 17 and 18:
Chapter 1. Introduction 4 Chapter 3
Page 19 and 20:
Chapter 1. Introduction 6 1.1 Relat
Page 21 and 22:
Chapter 2. Background and Theory 8
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Chapter 3 Tracking English Inclusio
Page 61 and 62:
Chapter 3. Tracking English Inclusi
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102: Chapter 3. Tracking English Inclusi
Page 113 and 114: Chapter 4 System Extension to a New
Page 115 and 116: Chapter 4. System Extension to a Ne
Page 129 and 130: Chapter 5 Parsing English Inclusion
Page 131 and 132: Chapter 5. Parsing English Inclusio
Page 151: Chapter 5. Parsing English Inclusio
Page 159 and 160: Chapter 6 Other Potential Applicati
Page 161 and 162: Chapter 6. Other Potential Applicat
Page 187 and 188: Chapter 7 Conclusions and Future Wo
Page 189 and 190: Chapter 7. Conclusions and Future W
Page 191 and 192: Appendix A. Evaluation Metrics and
Page 199 and 200: Appendix B. Guidelines for Annotati
Page 201 and 202: Appendix B. Guidelines for Annotati
Page 203 and 204:
Appendix B. Guidelines for Annotati
Page 205 and 206:
Appendix C TIGER Tags and Labels C.
Page 207 and 208:
Appendix C. TIGER Tags and Labels 1
Page 209 and 210:
Appendix C. TIGER Tags and Labels 1
Page 211 and 212:
Bibliography 198 Andersen, G. (2005
Page 213 and 214:
Bibliography 200 Bresnan, J. (2001)
Page 215 and 216:
Bibliography 202 Damashek, M. (1995
Page 217 and 218:
Bibliography 204 Finkel, J., Dingar
Page 219 and 220:
Bibliography 206 Hachey, B., Alex,
Page 221 and 222:
Bibliography 208 Kirkness, A. (1984
Page 223 and 224:
Bibliography 210 and Technology (In
Page 225 and 226:
Bibliography 212 Poplack, S. (1988)
Page 227 and 228:
Bibliography 214 Sokol, D. K. (2000
Page 229:
Bibliography 216 Yang, W. (1990). A
show all

PhD thesis - School of Informatics - University of Edinburgh

Create successful ePaper yourself

Delete template?

Save as template?