05.03.2013 Views

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 6. Other Potential Applications 167<br />

GERMAN: Mit diesem Tool können Sie parallel in sämtlichen News<br />

suchen.<br />

ENGLISH(BABELFISH): With this Tool you can search parallel in all news.<br />

FRENCH(BABELFISH): Avec ce Tool, vous pouvez chercher parallèlement tous les<br />

News dans.<br />

While it was established in Sections 2.1.3.2 and 4.2 that French contains a large<br />

number <strong>of</strong> anglicisms at least in the domain <strong>of</strong> IT, the use <strong>of</strong> such anglicisms may,<br />

however, not always be the preferred choice by a human translator (HT). They may<br />

produce the following French translation <strong>of</strong> the German sentence:<br />

FRENCH(HT): Avec cet outil, vous pouvez chercher tous les actualités en parallèle.<br />

Mixed-lingual compounds or interlingual homographs are even more <strong>of</strong> a challenge<br />

to MT systems. One very interesting example occurs in the following German query: 10<br />

GERMAN: Nenne einen Grund für Selbstmord bei Teenagern.<br />

ENGLISH(BABELFISH): Call a reason for suicide with dte rodents.<br />

FRENCH(BABELFISH): cite une raison de suicide avec des Teenagern.<br />

The English inclusion Teenager appears in the dative plural and consequently re-<br />

ceives the German inflection n. Instead <strong>of</strong> treating this noun as an English inclusion<br />

when translating the sentence into English, Babelfish processes this token as the Ger-<br />

man compound Tee+Nagern (tea + rodents) and translates its subparts into the token<br />

dte 11 and the noun rodents. Translating into French, the MT system treats the English<br />

inclusion as unseen and inserts it directly into the translation without further process-<br />

ing, such as inflection removal. Combined with a named entity recogniser, the English<br />

inclusion classifier could signal to the MT engine which items require either translating<br />

or transferring with respect to the target language.<br />

Multi-word English inclusions also pose difficulty to most MT systems. If the<br />

system has not encountered a particular expression in its training data or its lexicon,<br />

it is likely to treat the entire expression as unseen. However, MT systems are not<br />

necessarily aware <strong>of</strong> the boundaries <strong>of</strong> such multi-word expression as illustrated in<br />

10This query appeared in CLEF 2004 where one <strong>of</strong> the tasks was to find answers to German question<br />

in English documents (Ahn et al., 2004).<br />

11Note that dte is not a typo but an error in the MT output.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!