05.03.2013 Views

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3. Tracking English Inclusions in German 93<br />

ble 3.20). In the first experiment (ID1), the tagger’s standard feature set is used which<br />

includes words, character sub-strings, word shapes, POS tags, abbreviations and NE<br />

tags (Finkel et al., 2005). The resulting F-scores are high both for the internet and<br />

space travel data (84.74 and 91.29 points) but extremely low for the EU data (13.33<br />

points) due to the sparseness <strong>of</strong> English inclusions in that data set. ID2 involves the<br />

same setup as ID1 but eliminating all features relying on the POS tags. The tagger<br />

performs similarly well for the internet and space travel data as for ID1 but improves<br />

by 8 points to an F-score <strong>of</strong> 21.28 for the EU data. This can be attributed to the fact that<br />

the POS tagger does not perform with perfect accuracy particularly on data containing<br />

foreign inclusions. Training the supervised tagger on POS tag information is therefore<br />

not necessarily useful for this task, especially when the data is sparse. Despite the<br />

improvement, there is a big discrepancy between the F-score which the ML classifier<br />

produces for the EU data and those <strong>of</strong> the other two data sets.<br />

Table 3.20 compares the best F-scores produced with the tagger’s own feature set<br />

(ID2) to the best results <strong>of</strong> the English inclusion classifier presented in this <strong>thesis</strong> and<br />

the baseline. The best English inclusion classifier is the full system combined with<br />

consistency checking (Section 3.3.6). For the EU data, the English inclusion classi-<br />

fier performs significantly better than the supervised tagger (χ 2 : d f = 1, p ≤ 0.05).<br />

However, since this data set only contains a small number <strong>of</strong> English inclusions, this<br />

result is not unexpected. It is therefore difficult to draw any meaningful conclusions<br />

from these results. For the internet and space travel data sets, which contain many En-<br />

glish inclusions, the trained maxent tagger and the English inclusion classifier perform<br />

equally well, i.e. without statistical significance between the difference in performance<br />

(χ 2 : d f = 1, p ≤ 1). The fact that the maxent tagger requires hand-annotated train-<br />

ing data, however, represents a serious drawback. Conversely, the English inclusion<br />

classifier does not rely on annotated data and is therefore much more portable to new<br />

domains. Section 3.4.3 shows that it performs well on unseen data in three different<br />

domains as well as on entirely new data provided by another research group. Given<br />

the necessary lexicons, the English inclusion classifier can easily be run over new text<br />

and text in a different language or domain without further cost. The time required to<br />

port the classifier to a new language is the focus <strong>of</strong> attention in the next chapter.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!