PhD thesis - School of Informatics - University of Edinburgh

More documents

Recommendations

Info

Chapter 3. Tracking English Inclusions in German 95 A further interesting observation is that the ML classifier and the English inclusion classifier perform differently in terms of precision and recall. The tagger is extremely precise but is unable to track all English inclusions. Conversely, the English inclusion classifier is able to identify a larger proportion of English inclusions but some of them by mistake. Therefore, a further experiment (ID3) was conducted, aiming at improving the overall score by combining the behaviour of both systems. ID3 is set up as ID2 but also incorporating the output of the English inclusion classifier as a gazetteer feature. As can be seen in Table 3.20, the tagger’s performance increases considerably for all three domains as a result of this additional language feature. The score for the EU data is however still lower than that achieved by the English inclusion classifier itself. 3.6.2 Cross-domain Experiments In the ID experiments described above, the maxent tagger achieved surprisingly high F-scores for the internet and space travel data, considering the small training sets of around 700 sentences. These high F-scores are mainly attributed to the high precision of the maxent classifier. Although both domains contain many English inclusions, their type-token ratio amounts to 0.29 in the internet and 0.15 in the space travel data (see Table 3.1 in Section 3.2), signalling that English inclusions are frequently repeated in both domains. As a result, the likelihood of the tagger encountering an unknown inclusion in the test data is relatively small which explains high precision scores in the ID experiments. In order to examine the maxent tagger’s performance on data in a new domain containing more unknown inclusions, two cross-domain (CD) experiments were carried out: CD1, training on the internet and testing on the space travel data, and CD2, training on the space travel and testing on the internet data. These two domain pairs were chosen to ensure that both the training and test data contain a relatively large number of English inclusions. Otherwise, the experiments were set up in the same way as experiment ID2 (see Section 3.6.1) using the standard feature set of the maxent tagger minus the POS tag feature. Table 3.21 presents the scores of both CD experiments as well as the percentage of unknown target-type (UTTs). This is the percentage of English types that occur in the test data but not in the training data. The F-scores for both CD experiments are much lower than those obtained when
Chapter 3. Tracking English Inclusions in German 96 Accuracy Precision Recall F-score UTT CD1 98.43% 91.23% 53.61% 67.53 81.9% EIC 99.45% 89.19% 93.61% 91.35 - Baseline 96.99% - - - - CD2 94.77% 97.10% 13.97% 24.43 93.9% EIC 98.25% 92.75% 77.37% 84.37 - Baseline 93.85% - - - - Table 3.21: Evaluation scores and percentages of unknown target types (UTT) for two cross-domain (CD) experiments using a maxent tagger compared to the performance of the EIC and the baseline. training and testing the tagger on documents from the same domain. In experiment CD1, the F-score only amounts to 67.53 points while the percentage of unknown target types in the space travel test data is 81.9%. The F-score is even lower in the second experiment at 24.43 points which can be attributed to the percentage of unknown target types in the internet test data being higher still at 93.9%. These results indicate that the tagger’s high performance in the ID experiments is largely due to the fact that the English inclusions in the test data are known, i.e. the tagger learns a lexicon. It is therefore more complex to train a ML classifier to perform well on new data with more and more new anglicisms entering German over time. The amount of unknown tokens will increase constantly unless new annotated training data is added. It can be concluded that the annotation-free English inclusion classifier has a real advantage over any solution that relies on a static set of annotated training data. 3.6.3 Learning Curve As seen in the previous in- and cross-domain experiments, the statistical ML classifier performs very differently depending on the amount of annotations present in the training data and the domain of that data. In order to get an idea how this classifier performs compared to the English inclusion classifier on a much larger data set, the entire German evaluation data (development and test data for all three domains) was pooled into a large data set containing 145 newspaper articles. As the English inclu-
Page 1 and 2:
Automatic Detection of English Incl
Page 3 and 4:
these parsers with the annotation-f
Page 5 and 6:
Declaration I declare that this the
Page 7 and 8:
3.3.5 Post-processing Module . . .
Page 9 and 10:
A.2.2 Kappa Coefficient . . . . . .
Page 11 and 12:
5.6 Average relative token frequenc
Page 13 and 14:
3.16 Most frequent English inclusio
Page 15 and 16:
Chapter 1. Introduction 2 siderable
Page 17 and 18:
Chapter 1. Introduction 4 Chapter 3
Page 19 and 20:
Chapter 1. Introduction 6 1.1 Relat
Page 21 and 22:
Chapter 2. Background and Theory 8
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58: Chapter 2. Background and Theory 44
Page 59 and 60: Chapter 3 Tracking English Inclusio
Page 61 and 62: Chapter 3. Tracking English Inclusi
Page 107: Chapter 3. Tracking English Inclusi
Page 113 and 114: Chapter 4 System Extension to a New
Page 115 and 116: Chapter 4. System Extension to a Ne
Page 129 and 130: Chapter 5 Parsing English Inclusion
Page 131 and 132: Chapter 5. Parsing English Inclusio
Page 159 and 160:
Chapter 6 Other Potential Applicati
Page 161 and 162:
Chapter 6. Other Potential Applicat
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Chapter 7 Conclusions and Future Wo
Page 189 and 190:
Chapter 7. Conclusions and Future W
Page 191 and 192:
Appendix A. Evaluation Metrics and
Page 193 and 194:
Page 195 and 196:
Page 197 and 198:
Page 199 and 200:
Appendix B. Guidelines for Annotati
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
Appendix C TIGER Tags and Labels C.
Page 207 and 208:
Appendix C. TIGER Tags and Labels 1
Page 209 and 210:
Appendix C. TIGER Tags and Labels 1
Page 211 and 212:
Bibliography 198 Andersen, G. (2005
Page 213 and 214:
Bibliography 200 Bresnan, J. (2001)
Page 215 and 216:
Bibliography 202 Damashek, M. (1995
Page 217 and 218:
Bibliography 204 Finkel, J., Dingar
Page 219 and 220:
Bibliography 206 Hachey, B., Alex,
Page 221 and 222:
Bibliography 208 Kirkness, A. (1984
Page 223 and 224:
Bibliography 210 and Technology (In
Page 225 and 226:
Bibliography 212 Poplack, S. (1988)
Page 227 and 228:
Bibliography 214 Sokol, D. K. (2000
Page 229:
Bibliography 216 Yang, W. (1990). A
show all

PhD thesis - School of Informatics - University of Edinburgh

Create successful ePaper yourself

Delete template?

Save as template?