PhD thesis - School of Informatics - University of Edinburgh
PhD thesis - School of Informatics - University of Edinburgh
PhD thesis - School of Informatics - University of Edinburgh
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 3. Tracking English Inclusions in German 93<br />
ble 3.20). In the first experiment (ID1), the tagger’s standard feature set is used which<br />
includes words, character sub-strings, word shapes, POS tags, abbreviations and NE<br />
tags (Finkel et al., 2005). The resulting F-scores are high both for the internet and<br />
space travel data (84.74 and 91.29 points) but extremely low for the EU data (13.33<br />
points) due to the sparseness <strong>of</strong> English inclusions in that data set. ID2 involves the<br />
same setup as ID1 but eliminating all features relying on the POS tags. The tagger<br />
performs similarly well for the internet and space travel data as for ID1 but improves<br />
by 8 points to an F-score <strong>of</strong> 21.28 for the EU data. This can be attributed to the fact that<br />
the POS tagger does not perform with perfect accuracy particularly on data containing<br />
foreign inclusions. Training the supervised tagger on POS tag information is therefore<br />
not necessarily useful for this task, especially when the data is sparse. Despite the<br />
improvement, there is a big discrepancy between the F-score which the ML classifier<br />
produces for the EU data and those <strong>of</strong> the other two data sets.<br />
Table 3.20 compares the best F-scores produced with the tagger’s own feature set<br />
(ID2) to the best results <strong>of</strong> the English inclusion classifier presented in this <strong>thesis</strong> and<br />
the baseline. The best English inclusion classifier is the full system combined with<br />
consistency checking (Section 3.3.6). For the EU data, the English inclusion classi-<br />
fier performs significantly better than the supervised tagger (χ 2 : d f = 1, p ≤ 0.05).<br />
However, since this data set only contains a small number <strong>of</strong> English inclusions, this<br />
result is not unexpected. It is therefore difficult to draw any meaningful conclusions<br />
from these results. For the internet and space travel data sets, which contain many En-<br />
glish inclusions, the trained maxent tagger and the English inclusion classifier perform<br />
equally well, i.e. without statistical significance between the difference in performance<br />
(χ 2 : d f = 1, p ≤ 1). The fact that the maxent tagger requires hand-annotated train-<br />
ing data, however, represents a serious drawback. Conversely, the English inclusion<br />
classifier does not rely on annotated data and is therefore much more portable to new<br />
domains. Section 3.4.3 shows that it performs well on unseen data in three different<br />
domains as well as on entirely new data provided by another research group. Given<br />
the necessary lexicons, the English inclusion classifier can easily be run over new text<br />
and text in a different language or domain without further cost. The time required to<br />
port the classifier to a new language is the focus <strong>of</strong> attention in the next chapter.