PhD thesis - School of Informatics - University of Edinburgh

More documents

Recommendations

Info

Chapter 5. Parsing English Inclusions 125 POS-tag NE FM NN KON CARD ADJD APPR Count 1185 512 44 8 8 1 1 5.3.2.3 Inclusion Entity Model Table 5.1: POS tags of English inclusions. The word-by-word parser fails to take advantage of one important trend in the data: that foreign inclusion tokens tend to be adjacent and these adjacent words usually refer to the same entity. There is nothing stopping the word-by-word parser from positing a constituent boundary between two adjacent foreign inclusions. The inclusion entity model is designed to restrict such spurious bracketing. It does so by way of another tree transformation. The new category FP (foreign phrase) is added below any node domi- nating at least one token marked FOM during training. For example, when encountering a FOM sequence dominated by PN as in Figure 5.5(a), the tree is modified so that it is the FP rule which generates the FOM tokens. Figure 5.5(b) shows the modified tree. In all cases, a unary rule PN→FP is introduced. As this extra rule decreases the probability of the entire tree, the parser has a bias to introduce as few of these rules as possible – thus limiting the number of categories which expand to FOMs. Once a candidate parse is created during testing, the inverse operation is applied, removing the FP node. 5.3.3 Method For all experiments reported here, the different versions of the parser are trained on the TIGER treebank. As the inclusion and random sets are drawn from the whole treebank, it is necessary to ensure that the data used to train the parser does not overlap with these test sentences. The experiments are therefore designed as multi-fold cross-validation tests. Using 5 folds, each model is trained on 80% of the data while the remaining 20% is held out. The held-out set is then intersected with the inclusion set (or, respectively, the random set). The evaluation metrics are calculated on this sub-set of the inclusion set (or random set), using the parser trained on the corresponding training data. This process ensures that the test sentences are not contained in the training data. The overall performance metrics of the parser are calculated on aggregated totals
Chapter 5. Parsing English Inclusions 126 FOM PN FOM . . . . . . (a) Whenever a FOM is encoun- tered... FOM PN FP FOM . . . . . . (b) ...a new FP category is cre- Figure 5.5: Tree transformation employed in the inclusion entity parser. of the five held-out test sets. For each experiment, parsing performance is reported in terms of the standard PARSEVAL scores (Black et al., 1991), including coverage (Cov), labelled precision (P) and recall (R) and F-score, the average number of crossing brackets (AvgCB), and the percentage of sentences parsed with zero and with two or fewer crossing brackets (0CB and ≤2CB). In addition, dependency accuracy (Dep) is also reported. Dependency accuracy is calculated by means of the approach described in Lin (1995), using the head-picking method employed by Dubey (2005a). The labelled bracketing figures (P, R and F) and the dependency score are calculated on all sentences, with those which are out-of-coverage getting zero counts. The crossing bracket scores are calculated only on those sentences which are successfully parsed. Stratified shuffling is used to determine statistical difference between precision and recall values of different runs. 4 In particular, statistical difference is determined over the baseline and the perfect tagging model runs for both the inclusion and the random test sets. In order to differentiate between the different tests, Table 5.2 lists a set of diacritics used to indicate a given (in)significance. 4 This approach to statistical testing is described in detail at: http://www.cis.upenn.edu/ ˜dbikel/software.html ated
Page 1 and 2:
Automatic Detection of English Incl
Page 3 and 4:
these parsers with the annotation-f
Page 5 and 6:
Declaration I declare that this the
Page 7 and 8:
3.3.5 Post-processing Module . . .
Page 9 and 10:
A.2.2 Kappa Coefficient . . . . . .
Page 11 and 12:
5.6 Average relative token frequenc
Page 13 and 14:
3.16 Most frequent English inclusio
Page 15 and 16:
Chapter 1. Introduction 2 siderable
Page 17 and 18:
Chapter 1. Introduction 4 Chapter 3
Page 19 and 20:
Chapter 1. Introduction 6 1.1 Relat
Page 21 and 22:
Chapter 2. Background and Theory 8
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Chapter 3 Tracking English Inclusio
Page 61 and 62:
Chapter 3. Tracking English Inclusi
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88: Chapter 3. Tracking English Inclusi
Page 113 and 114: Chapter 4 System Extension to a New
Page 115 and 116: Chapter 4. System Extension to a Ne
Page 129 and 130: Chapter 5 Parsing English Inclusion
Page 131 and 132: Chapter 5. Parsing English Inclusio
Page 137: Chapter 5. Parsing English Inclusio
Page 159 and 160: Chapter 6 Other Potential Applicati
Page 161 and 162: Chapter 6. Other Potential Applicat
Page 187 and 188: Chapter 7 Conclusions and Future Wo
Page 189 and 190:
Chapter 7. Conclusions and Future W
Page 191 and 192:
Appendix A. Evaluation Metrics and
Page 193 and 194:
Page 195 and 196:
Page 197 and 198:
Page 199 and 200:
Appendix B. Guidelines for Annotati
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
Appendix C TIGER Tags and Labels C.
Page 207 and 208:
Appendix C. TIGER Tags and Labels 1
Page 209 and 210:
Appendix C. TIGER Tags and Labels 1
Page 211 and 212:
Bibliography 198 Andersen, G. (2005
Page 213 and 214:
Bibliography 200 Bresnan, J. (2001)
Page 215 and 216:
Bibliography 202 Damashek, M. (1995
Page 217 and 218:
Bibliography 204 Finkel, J., Dingar
Page 219 and 220:
Bibliography 206 Hachey, B., Alex,
Page 221 and 222:
Bibliography 208 Kirkness, A. (1984
Page 223 and 224:
Bibliography 210 and Technology (In
Page 225 and 226:
Bibliography 212 Poplack, S. (1988)
Page 227 and 228:
Bibliography 214 Sokol, D. K. (2000
Page 229:
Bibliography 216 Yang, W. (1990). A
show all

PhD thesis - School of Informatics - University of Edinburgh

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?