05.03.2013 Views

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3. Tracking English Inclusions in German 85<br />

occurring in advertising are conspicuous for nearly all <strong>of</strong> the respondents (98%). A<br />

closer look at the data reveals that Marcadet et al. (2005) do not consistently annotate<br />

abbreviations and acronyms expanding to English definitions as English. Conversely,<br />

the English inclusion classifier presented in this <strong>thesis</strong> is designed to recognise them<br />

as well. Moreover, person names like Ted Saskins are annotated as English and not<br />

distinguished from real English inclusions as advocated in this <strong>thesis</strong>.<br />

On the reconciled gold standard, the English inclusion classifier performs with an<br />

F-score <strong>of</strong> 96.35 points (an accuracy <strong>of</strong> 98.95%, a precision <strong>of</strong> 97.78% and a recall <strong>of</strong><br />

94.96%). These scores are slightly better than those reported by Marcadet et al. (2005)<br />

on this data set (98.67% accuracy). However, it is not entirely straightforward to com-<br />

pare these scores as the gold standard annotation is reconciled. The few classification<br />

errors are mainly due to English words like Team or Management being already listed<br />

in the German lexicon. These anglicisms are strongly integrated in the German lan-<br />

guage and have well established pronunciations. Therefore, such classification errors<br />

are unlikely to cause pronunciation problems during TTS syn<strong>thesis</strong>.<br />

Given the results <strong>of</strong> both sets <strong>of</strong> evaluations, it can be concluded that the English<br />

inclusion classifier performs well on randomly selected unseen mixed-lingual data in<br />

different domains and compares well to an existing mixed-lingual LID approach.<br />

3.5 Parameter Tuning Experiments<br />

This section discusses a series <strong>of</strong> interesting parameter tuning experiments to optimise<br />

the English inclusion classifier. These were the basis for the final design <strong>of</strong> the full<br />

system which was evaluated in the previous section. These experiments include a task-<br />

based evaluation <strong>of</strong> three different POS taggers and a task-based evaluation <strong>of</strong> two<br />

search engines. All experiments involve the German development data for evaluation.<br />

3.5.1 Task-based Evaluation <strong>of</strong> Different POS taggers<br />

Throughout the entire process <strong>of</strong> error analysis, it was noticed that the performance <strong>of</strong><br />

the English inclusion classifier depends to some extent on the performance on the POS<br />

tagger. Initially, the system made use <strong>of</strong> the POS tagger TnT (Brants, 2000b) trained<br />

on the NEGRA corpus (Skut et al., 1997). Some classification errors result from errors

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!