PhD thesis - School of Informatics - University of Edinburgh

More documents

Recommendations

Info

Chapter 3. Tracking English Inclusions in German 93 ble 3.20). In the first experiment (ID1), the tagger’s standard feature set is used which includes words, character sub-strings, word shapes, POS tags, abbreviations and NE tags (Finkel et al., 2005). The resulting F-scores are high both for the internet and space travel data (84.74 and 91.29 points) but extremely low for the EU data (13.33 points) due to the sparseness of English inclusions in that data set. ID2 involves the same setup as ID1 but eliminating all features relying on the POS tags. The tagger performs similarly well for the internet and space travel data as for ID1 but improves by 8 points to an F-score of 21.28 for the EU data. This can be attributed to the fact that the POS tagger does not perform with perfect accuracy particularly on data containing foreign inclusions. Training the supervised tagger on POS tag information is therefore not necessarily useful for this task, especially when the data is sparse. Despite the improvement, there is a big discrepancy between the F-score which the ML classifier produces for the EU data and those of the other two data sets. Table 3.20 compares the best F-scores produced with the tagger’s own feature set (ID2) to the best results of the English inclusion classifier presented in this thesis and the baseline. The best English inclusion classifier is the full system combined with consistency checking (Section 3.3.6). For the EU data, the English inclusion classifier performs significantly better than the supervised tagger (χ 2 : d f = 1, p ≤ 0.05). However, since this data set only contains a small number of English inclusions, this result is not unexpected. It is therefore difficult to draw any meaningful conclusions from these results. For the internet and space travel data sets, which contain many En- glish inclusions, the trained maxent tagger and the English inclusion classifier perform equally well, i.e. without statistical significance between the difference in performance (χ 2 : d f = 1, p ≤ 1). The fact that the maxent tagger requires hand-annotated training data, however, represents a serious drawback. Conversely, the English inclusion classifier does not rely on annotated data and is therefore much more portable to new domains. Section 3.4.3 shows that it performs well on unseen data in three different domains as well as on entirely new data provided by another research group. Given the necessary lexicons, the English inclusion classifier can easily be run over new text and text in a different language or domain without further cost. The time required to port the classifier to a new language is the focus of attention in the next chapter.
Chapter 3. Tracking English Inclusions in German 94 Experiment Accuracy Precision Recall F-score Internet ID1 98.39% 95.43% 76.23% 84.75 ID2 98.35% 96.38% 74.87% 84.27 ID3 99.23% 95.33% 91.45% 93.35 EIC 98.25% 92.75% 77.37% 84.37 Baseline 93.95% - - - Space ID1 99.51% 99.51% 84.33% 91.29 ID2 99.53% 99.51% 84.54% 91.42 ID3 99.65% 96.30% 91.13% 93.64 EIC 99.45% 89.19% 93.61% 91.35 Baseline 96.99% - - - EU ID1 99.71% 100.00% 7.14% 13.33 ID2 99.73% 100.00% 11.90% 21.28 ID3 99.77% 100.00% 28.57% 44.44 EIC 99.78% 59.26% 76.19% 66.67 Baseline 99.69% - - - Table 3.20: A series of in-domain (ID) experiments illustrating the performance of a trained ML classifier compared to the English inclusion classifier (EIC) and the baseline.
Page 1 and 2:
Automatic Detection of English Incl
Page 3 and 4:
these parsers with the annotation-f
Page 5 and 6:
Declaration I declare that this the
Page 7 and 8:
3.3.5 Post-processing Module . . .
Page 9 and 10:
A.2.2 Kappa Coefficient . . . . . .
Page 11 and 12:
5.6 Average relative token frequenc
Page 13 and 14:
3.16 Most frequent English inclusio
Page 15 and 16:
Chapter 1. Introduction 2 siderable
Page 17 and 18:
Chapter 1. Introduction 4 Chapter 3
Page 19 and 20:
Chapter 1. Introduction 6 1.1 Relat
Page 21 and 22:
Chapter 2. Background and Theory 8
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56: Chapter 2. Background and Theory 42
Page 57 and 58: Chapter 2. Background and Theory 44
Page 59 and 60: Chapter 3 Tracking English Inclusio
Page 61 and 62: Chapter 3. Tracking English Inclusi
Page 105: Chapter 3. Tracking English Inclusi
Page 113 and 114: Chapter 4 System Extension to a New
Page 115 and 116: Chapter 4. System Extension to a Ne
Page 129 and 130: Chapter 5 Parsing English Inclusion
Page 131 and 132: Chapter 5. Parsing English Inclusio
Page 157 and 158:
Chapter 5. Parsing English Inclusio
Page 159 and 160:
Chapter 6 Other Potential Applicati
Page 161 and 162:
Chapter 6. Other Potential Applicat
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Chapter 7 Conclusions and Future Wo
Page 189 and 190:
Chapter 7. Conclusions and Future W
Page 191 and 192:
Appendix A. Evaluation Metrics and
Page 193 and 194:
Page 195 and 196:
Page 197 and 198:
Page 199 and 200:
Appendix B. Guidelines for Annotati
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
Appendix C TIGER Tags and Labels C.
Page 207 and 208:
Appendix C. TIGER Tags and Labels 1
Page 209 and 210:
Appendix C. TIGER Tags and Labels 1
Page 211 and 212:
Bibliography 198 Andersen, G. (2005
Page 213 and 214:
Bibliography 200 Bresnan, J. (2001)
Page 215 and 216:
Bibliography 202 Damashek, M. (1995
Page 217 and 218:
Bibliography 204 Finkel, J., Dingar
Page 219 and 220:
Bibliography 206 Hachey, B., Alex,
Page 221 and 222:
Bibliography 208 Kirkness, A. (1984
Page 223 and 224:
Bibliography 210 and Technology (In
Page 225 and 226:
Bibliography 212 Poplack, S. (1988)
Page 227 and 228:
Bibliography 214 Sokol, D. K. (2000
Page 229:
Bibliography 216 Yang, W. (1990). A
show all

PhD thesis - School of Informatics - University of Edinburgh

Create successful ePaper yourself

Delete template?

Save as template?