PhD thesis - School of Informatics - University of Edinburgh

More documents

Recommendations

Info

Appendix A. Evaluation Metrics and Notation 181 κ-values can range from -1 (perfect disagreement) to +1 (perfect agreement); a value of 0 corresponds to chance agreement. Although there are no agreed standards, the scale suggested by Landis and Koch (1977) which is shown in Table A.3 is often used for interpreting κ-values. A.3 Parsing Evaluation Metrics The English inclusion classifier is also evaluated extrinsically in a series of parsing experiments with a statistical and a rule-based parser (Chapter 5). The performance of the different statistical parsing models, described in Chapter 5.3, are evaluated in terms of a series of metrics including labelled precision, recall and F-score, unlabelled dependency accuracy, and bracketing scores. They are explained in detail below. The output of the rule-based parser, used in the experiments discussed in Chapter 5.4, is merely evaluated in terms of coverage and average number of derivations per sentence. A.3.1 Labelled Precision, Recall and F-score Labelled precision, recall and F-score are calculated in the same way as in described in Chapter A.1 but on labelled brackets instead of language identification tags. This means that: • Labelled precision represents the ratio of the number of correctly labelled constituents in the parse tree and all constituents in the parse tree. A constituent counts as correct if it spans the same words and has the same label as a constituent in the gold tree. • Labelled recall is the ratio of the number of correctly labelled constituents in the parse tree and all constituents in the gold tree. • F-score is the harmonic mean of precision and recall. A.3.2 Dependency Accuracy Dependency-based evaluation of parsing output was first introduced by Lin (1995) who pointed out that the values of the previously described evaluation metrics can consid- erably deteriorate in case of a single attachment error that may not be that dramatic
Appendix A. Evaluation Metrics and Notation 182 from a linguistic point of view. This is the motivation behind unlabelled dependency accuracy (Dep), the evaluation metric proposed by Lin (1995), which is based on comparing dependency tuples in parse and gold trees instead of labelled constituents and phrase boundaries. A sentence is represented in terms of a dependency tree where each word (apart from the head word of the sentence) is the modifier (M → D) of another word (its head, or H) based on a grammatical relationship. Therefore, each fully parsed sentence and its gold standard tree are made up of N − 1 dependency tuples, where N is the number of words in the sentence. Dependency accuracy is calculated based on the number of dependents in the sentence that are assigned the same head as in the gold standard (H(M → D)correct) as: Dep = H(M → D)correct N − 1 (A.11) It is unlabelled, as the type of relation between the modifier and its head is not considered during evaluation. In order to perform dependency-based evaluation, the constituency trees that are output by the statistical parser must be converted into dependency trees. The conversion algorithm for this procedure is described in detail in Lin (1995). A.3.3 Bracketing Scores Parsing performance is also evaluated in terms of average crossing brackets (AvgCB), zero crossing brackets (0CB) and two or less crossing brackets (≤2CB). AvgCB is the average number of constituents in a parse tree that cross the constituent boundaries of the gold tree, e.g. ((W1 W2) W3) versus (W1 (W2 W3)). 0CB is the percentage of sentences for which constituents are non-crossing and ≤2CB is the proportion of sentences whose constituents cross twice or less with those of the gold parse tree. A.4 Statistical Tests When comparing the performance of the English inclusion classifier to that of another system, or to the baseline, Pearson’s chi-square (χ 2 ) test is used for determining statistical significance/insignificance in the difference.
Page 1 and 2:
Automatic Detection of English Incl
Page 3 and 4:
these parsers with the annotation-f
Page 5 and 6:
Declaration I declare that this the
Page 7 and 8:
3.3.5 Post-processing Module . . .
Page 9 and 10:
A.2.2 Kappa Coefficient . . . . . .
Page 11 and 12:
5.6 Average relative token frequenc
Page 13 and 14:
3.16 Most frequent English inclusio
Page 15 and 16:
Chapter 1. Introduction 2 siderable
Page 17 and 18:
Chapter 1. Introduction 4 Chapter 3
Page 19 and 20:
Chapter 1. Introduction 6 1.1 Relat
Page 21 and 22:
Chapter 2. Background and Theory 8
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Chapter 3 Tracking English Inclusio
Page 61 and 62:
Chapter 3. Tracking English Inclusi
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Chapter 4 System Extension to a New
Page 115 and 116:
Chapter 4. System Extension to a Ne
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Chapter 5 Parsing English Inclusion
Page 131 and 132:
Chapter 5. Parsing English Inclusio
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144: Chapter 5. Parsing English Inclusio
Page 159 and 160: Chapter 6 Other Potential Applicati
Page 161 and 162: Chapter 6. Other Potential Applicat
Page 187 and 188: Chapter 7 Conclusions and Future Wo
Page 189 and 190: Chapter 7. Conclusions and Future W
Page 191 and 192: Appendix A. Evaluation Metrics and
Page 193: Appendix A. Evaluation Metrics and
Page 197 and 198: Appendix A. Evaluation Metrics and
Page 199 and 200: Appendix B. Guidelines for Annotati
Page 205 and 206: Appendix C TIGER Tags and Labels C.
Page 207 and 208: Appendix C. TIGER Tags and Labels 1
Page 209 and 210: Appendix C. TIGER Tags and Labels 1
Page 211 and 212: Bibliography 198 Andersen, G. (2005
Page 213 and 214: Bibliography 200 Bresnan, J. (2001)
Page 215 and 216: Bibliography 202 Damashek, M. (1995
Page 217 and 218: Bibliography 204 Finkel, J., Dingar
Page 219 and 220: Bibliography 206 Hachey, B., Alex,
Page 221 and 222: Bibliography 208 Kirkness, A. (1984
Page 223 and 224: Bibliography 210 and Technology (In
Page 225 and 226: Bibliography 212 Poplack, S. (1988)
Page 227 and 228: Bibliography 214 Sokol, D. K. (2000
Page 229: Bibliography 216 Yang, W. (1990). A
show all

PhD thesis - School of Informatics - University of Edinburgh

Create successful ePaper yourself

Delete template?

Save as template?