05.03.2013 Views

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3. Tracking English Inclusions in German 50<br />

Data Development Set Test Set<br />

Domain Tokens % Types % TTR Tokens % Types % TTR<br />

IT Total 15919 4152 0.26 16219 4404 0.27<br />

English 963 6.0 283 6.8 0.29 1034 6.4 258 5.9 0.25<br />

SP Total 16066 3938 0.25 16171 4315 0.27<br />

English 485 3.0 73 1.9 0.15 456 2.8 151 3.5 0.33<br />

EU Total 16028 4048 0.25 16296 4128 0.25<br />

English 49 0.3 30 0.7 0.61 173 1.1 86 2.1 0.50<br />

Table 3.1: English token and type statistics and type-token-ratios (TTR) in the German<br />

development and test data sets.<br />

expected. The strong presence <strong>of</strong> English inclusions in the articles from the other two<br />

domains was anticipated, as English is the dominant language in science & technology.<br />

While the proportion <strong>of</strong> English inclusions is relatively similar both in the devel-<br />

opment and test sets on internet & telecoms (6.0 versus 6.4%) and space travel (3.0<br />

versus 2.8%), the test set on the EU contains considerably more English inclusions<br />

(1.1) than the EU development set (0.3). Regarding the development data, the type-<br />

token ratios (TTRs) signal that the English inclusions in the space travel data are least<br />

diverse (0.15). However, in the test data, the internet-related articles contain the most<br />

repetitive English inclusions (0.25). Even though the articles are a random selection, it<br />

is difficult to draw definite conclusions from these numbers as the data sets are small.<br />

Table 3.2 lists the five most frequent English inclusions in each development set,<br />

covering various types <strong>of</strong> anglicisms that have entered the German language. All ex-<br />

amples demonstrate the increasing influence that English has on German. First, there<br />

are English terms such as Internet whose German equivalents, in this case Netz, are<br />

rarely used in comparison. This is reflected in their low frequency in the corpus. For<br />

example, Netz only appeared 25 times in all <strong>of</strong> the 25 IT articles in the development<br />

set, whereas Internet appeared 106 times in the same set <strong>of</strong> articles. The German term<br />

was only used 19% <strong>of</strong> the time. This result corresponds to the findings by Corr (2003)<br />

which show that Germans tend to favour the use <strong>of</strong> anglicisms referring to specific<br />

computer vocabulary over that <strong>of</strong> their German translations.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!