Machine Translation and Its Role in Professional Translation - Lugos
Machine Translation and Its Role in Professional Translation - Lugos
Machine Translation and Its Role in Professional Translation - Lugos
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
+<br />
<strong>Mach<strong>in</strong>e</strong> <strong>Translation</strong><br />
<strong>and</strong> <strong>Its</strong> <strong>Role</strong> <strong>in</strong><br />
<strong>Professional</strong> <strong>Translation</strong><br />
Špela V<strong>in</strong>tar, Dept. of <strong>Translation</strong> Studies<br />
University of Ljubljana<br />
Spela.v<strong>in</strong>tar@ff.uni-lj.si
+ Outl<strong>in</strong>e<br />
n Myths about MT<br />
n How does MT work<br />
n Overview of approaches<br />
n Underst<strong>and</strong><strong>in</strong>g data-driven MT<br />
n MT <strong>in</strong> practice<br />
n MT <strong>and</strong> professional translation<br />
n Post-edit<strong>in</strong>g MT<br />
n Further read<strong>in</strong>g
+ Myths about MT<br />
1. MT does not work – mach<strong>in</strong>es will never be able to<br />
translate literary texts.<br />
Accord<strong>in</strong>g to estimates, 90 % of translated texts <strong>in</strong>volve<br />
documentation projects, technical specifications, user<br />
manuals & similar.<br />
2. MT is not for professional translators, only for<br />
amateurs.<br />
Most CAT tools now <strong>in</strong>tegrate MT as a st<strong>and</strong>ard component<br />
when no fuzzy match was found.<br />
MT is an <strong>in</strong>tegral part of most localization projects.
+ Myths about MT<br />
3. MT is cheap, a translation is produced at the click of a<br />
button.<br />
The development of rule-based systems is extremely<br />
costly, <strong>and</strong> the development of data-driven systems needs<br />
massive amounts of data.<br />
4. MT will never be available for small language pairs.<br />
With grow<strong>in</strong>g availability of digital resources it is now<br />
possible to tra<strong>in</strong> SMT systems for many marg<strong>in</strong>al language<br />
pairs.
+ Myths about MT<br />
5. Only an MT system can produce a translation as<br />
ridiculous as ...<br />
Th<strong>in</strong>k of bad human translations ...<br />
6. In future there will be no need for human translators.<br />
The translator’s job description might change, but there<br />
will always be need for tech-savvy language professionals,<br />
<strong>and</strong> the global dem<strong>and</strong> for language services is still<br />
grow<strong>in</strong>g.
+ Know<strong>in</strong>g your enemy...<br />
Historically, several approaches to <strong>Mach<strong>in</strong>e</strong> <strong>Translation</strong><br />
have evolved:<br />
n Direct<br />
n Transfer<br />
Rule-based approaches<br />
n Interl<strong>in</strong>gua<br />
n Statistical<br />
n Example-based<br />
Data-driven approaches
+ Rule-based approaches & level of<br />
abstraction<br />
language-neutral <strong>in</strong>terl<strong>in</strong>gua<br />
ANALYSIS<br />
TRANSFER<br />
GENERATION<br />
source text<br />
direct translation<br />
target text<br />
The Vauquois Pyramid
+ Interl<strong>in</strong>gua vs. transfer<br />
L1<br />
L2<br />
L3<br />
L1<br />
L2<br />
L3<br />
<strong>in</strong>terl<strong>in</strong>gua<br />
L6<br />
L5<br />
L4<br />
L6<br />
L5<br />
L4
+ Sources of translation errors<br />
n Lexicon<br />
n Several possible equivalents for a source word, lexical<br />
gaps etc.<br />
n Syntax<br />
n Large syntactic transformations <strong>and</strong> changes <strong>in</strong> word<br />
order<br />
n Syntactic-semantic<br />
n No similar structure available <strong>in</strong> target language for<br />
the proposition to be translated<br />
n Idiomaticity, Phraseology, Collocations
+ Lexicon<br />
n Ambiguous source word with different<br />
equivalents for each sense<br />
n f<strong>in</strong>ancial bank vs. river bank<br />
n In the course of th<strong>in</strong>gs… (9 senses)<br />
n Time flies like an arrow.<br />
n Lexical gaps: source word denotes a sense<br />
which is not lexicalized <strong>in</strong> the target language<br />
n eg. cozy<br />
n Target language has a more f<strong>in</strong>e-gra<strong>in</strong>ed<br />
structure of senses than source language<br />
n eg.: EN wall -> DE W<strong>and</strong> (<strong>in</strong>ternal), Mauer<br />
(external)
+ Syntactic-semantic transformations<br />
n Change of structure:<br />
I like swimm<strong>in</strong>g<br />
“Ich schwimme gern”<br />
I swim gladly<br />
n Verb complements:<br />
Jones likes the film.<br />
“Le film plait à Jones.”<br />
n Passive constructions:<br />
n eg. French-> English<br />
Ces livres se lisent facilement<br />
*”These books read themselves easily”<br />
These books are easily read
+ Idiomaticity<br />
n Non-compositional mean<strong>in</strong>g<br />
n Examples:<br />
n George is a bull <strong>in</strong> a ch<strong>in</strong>a shop.<br />
n He kicked the bucket.<br />
SI: George je bik v trgov<strong>in</strong>i s porcelanom. Je brcnil vedro.<br />
HR: George je bik u staklarni. On je udario kantu.<br />
FI: George on norsu posli<strong>in</strong>ikaupassa. Hän potkaisi ämpäri.<br />
ES: George es un toro en una cacharrería. Dio una patada al cubo.<br />
GR: Ο Γιώργος είναι ένας ταύρος σε υαλοπωλείο. Κλώτσησε τον κάδο.
+ Do we need rules?<br />
n Children learn language without rules...<br />
n ...so why not mach<strong>in</strong>es?<br />
n From 1990s, <strong>in</strong>creas<strong>in</strong>g amounts of digital texts available on<br />
the web, <strong>in</strong>clud<strong>in</strong>g parallel texts.<br />
n Basic idea of SMT:<br />
n If we have a text <strong>and</strong> its translation, <strong>and</strong> we perform sentencealignment:<br />
n For each source word <strong>in</strong> the source sentence the system can<br />
check whether there is a target word that cont<strong>in</strong>uously occurs <strong>in</strong><br />
the translated sentence.
Statistical <strong>Mach<strong>in</strong>e</strong> <strong>Translation</strong> (SMT)<br />
Bil<strong>in</strong>gual <strong>and</strong><br />
Monol<strong>in</strong>gual<br />
Data*<br />
*Assumed:<br />
large quantities of high-quality<br />
bil<strong>in</strong>gual data aligned at sentence<br />
level<br />
<strong>Translation</strong><br />
Model<br />
Language<br />
Model<br />
Decoder:<br />
choose t such that<br />
argmax P(t|s) = argmax P(t).P(s|t)<br />
Source: Andy Way.<br />
2011, SMT Walkthrough
+<br />
Language Model<strong>in</strong>g<br />
" A language model assigns a probability to every str<strong>in</strong>g <strong>in</strong> that<br />
language.<br />
" In practice, we gather a huge database of utterances <strong>and</strong> then<br />
calculate the relative frequencies of each.<br />
" Problems?<br />
" many (nearly all) str<strong>in</strong>gs will receive no probability as we haven’t<br />
seen them …<br />
" all unseen good <strong>and</strong> bad str<strong>in</strong>gs are deemed equally unlikely …<br />
" Solution: How do we know if a new utterance is valid or not? By<br />
break<strong>in</strong>g it down <strong>in</strong>to substr<strong>in</strong>gs -> n-gram models
The <strong>Translation</strong> Model<br />
the language model<br />
SMT: argmax P(e|f) = argmax<br />
P(e) . P(f|e)<br />
the translation model<br />
Remember:<br />
If we carry out, for example, French-to-English translation, then<br />
we will have:<br />
- an English Language Model, <strong>and</strong><br />
- an English-to-French <strong>Translation</strong> Model.<br />
When we see a French str<strong>in</strong>g f, we want to reason backwards …<br />
What English str<strong>in</strong>g e is:<br />
- likely to be uttered?<br />
- likely to then translate to f?<br />
We are look<strong>in</strong>g for the English str<strong>in</strong>g e that maximises<br />
P(e) * P(f|e).
+ Statistical <strong>Mach<strong>in</strong>e</strong> <strong>Translation</strong>:<br />
Milestones<br />
n In 2000, Egypt / Giza SMT toolkits freely available for<br />
research purposes<br />
n Google’s first MT attempts <strong>in</strong> 2002.<br />
n In 2004, Franz Josef Och jo<strong>in</strong>s Google <strong>and</strong> soon starts head<strong>in</strong>g<br />
their language research unit.<br />
n Today, 57 supported languages.<br />
n In 2009, Microsoft launches B<strong>in</strong>g, their own web-based free<br />
translation service.<br />
n Moses: an EC-funded free SMT eng<strong>in</strong>e.
+ SMT <strong>and</strong> RBMT <strong>in</strong> practice<br />
n Question is often not “HT or MT”, but “Either MT or No<br />
translation at all”.<br />
n MT@EC: the EU is gradually switch<strong>in</strong>g to MT <strong>and</strong> retra<strong>in</strong><strong>in</strong>g<br />
their staff to post-edit MT output.<br />
n EuroMatrix <strong>and</strong> EuroMatrix Plus projects have developed SMT for<br />
all 23 languages (us<strong>in</strong>g English as pivot).<br />
n In order to translate 6,8 mio pages -> 8500 full-time translators<br />
would be needed.<br />
n Large corporations use MT (Adobe, Microsoft, Volkswagen,<br />
SAP ...).<br />
n Contrary to common belief, the contest between SMT <strong>and</strong><br />
RBMT is not over yet...
+<br />
100%<br />
90%<br />
analytic<br />
80%<br />
70%<br />
<strong>in</strong>flected<br />
60%<br />
50%<br />
composita<br />
strong<br />
agglut<strong>in</strong>ation<br />
40%<br />
30%<br />
highly <strong>in</strong>flected languages<br />
20%<br />
10%<br />
0%<br />
Romance<br />
languages<br />
Germanic<br />
languages<br />
Slavic<br />
languages<br />
Hellenic<br />
Baltic<br />
lang.<br />
ES<br />
FR<br />
IT<br />
PT<br />
RO<br />
DE<br />
DA<br />
NL<br />
SV<br />
BG<br />
CS<br />
PL<br />
SK<br />
SL<br />
EL<br />
MT<br />
LT<br />
LV<br />
ET<br />
Semitic<br />
FI<br />
HU<br />
F<strong>in</strong>no-<br />
Ugric<br />
DGT's SMT maturity check outcome as a (<br />
useful<br />
useless) sentences ratio + morphology
+ MT evaluation<br />
n What is a good human translation?<br />
n Subjectivity<br />
n Purpose of the translation<br />
n MT evaluation:<br />
n Back-translation<br />
HR: Pas je išao preko ceste.<br />
EN: The dog walked across the road.<br />
HR: Pas hodao preko ceste.<br />
n Human<br />
n Automatic
+ Human evaluation<br />
n Human:<br />
n Various methods proposed….<br />
n Most widely accepted method by ARPA:<br />
n Fluency: rat<strong>in</strong>g how good the target language is.<br />
n Adequacy: rat<strong>in</strong>g how much <strong>in</strong>formation is transferred between<br />
the orig<strong>in</strong>al <strong>and</strong> the translation.
+ Automatic evaluation<br />
n A good metric will correlate closely to human judgement<br />
n Metrics usually measure the similarity between MT output<br />
<strong>and</strong> a human (reference) translation<br />
n Several metrics proposed:<br />
n BLEU<br />
n NIST<br />
n Meteor<br />
n WER<br />
n …
+ MT <strong>and</strong> professional translation<br />
What translators say:<br />
n Us<strong>in</strong>g MT is worse <strong>and</strong> slower than hav<strong>in</strong>g to translate from<br />
scratch (D).<br />
n Translat<strong>in</strong>g with the help of MT is of course faster (D).<br />
n Post-edit<strong>in</strong>g MT output is more difficult <strong>and</strong> less predictable (D).<br />
n Post-edit<strong>in</strong>g MT output is easier <strong>and</strong> more predictable (RU).<br />
n Never had to do it <strong>and</strong> hope I never will (EN).<br />
n It can be useful <strong>in</strong> many situations (NO).<br />
n It h<strong>in</strong>ders the natural flow of typ<strong>in</strong>g – more click<strong>in</strong>g, copy-past<strong>in</strong>g.<br />
It’s physically tir<strong>in</strong>g (D, F).<br />
n It’s the only way of match<strong>in</strong>g clients’ expectations regard<strong>in</strong>g<br />
turnaround (F, RU).<br />
(Source: Weiss, I. (2011) MT Oslo)
+ MT <strong>and</strong> professional translation<br />
What translation vendors say:<br />
n Deploy <strong>and</strong> customize MT to start translat<strong>in</strong>g more for less!<br />
n Sav<strong>in</strong>g Time, Lower<strong>in</strong>g Costs, Improv<strong>in</strong>g Service...<br />
n In 2010, PEMT represented 200k out of 20M words (1%).<br />
In 2012, we expect a total volume of over 30M words, with at least<br />
7M post-edited (23,3%).<br />
What clients say:<br />
n Does the translation <strong>in</strong>dustry have a “cost disease”? (No ga<strong>in</strong>s <strong>in</strong><br />
productivity when consider<strong>in</strong>g translation proper)<br />
n Quality is what each customer says it is.<br />
n How much quality is enough?
+ The long tail of languages on the<br />
web
+ New model of user-def<strong>in</strong>ed quality<br />
Traditional:<br />
translation<br />
copyedit<strong>in</strong>g<br />
proof<br />
60% 25% 15%<br />
Scalable:<br />
MT post-edit<strong>in</strong>g proof<br />
X % Y %
+ Post-edit<strong>in</strong>g MT<br />
n ...yes, there is an abbreviation: PEMT<br />
n MT is regularly used <strong>in</strong> localization projects where no TM<br />
match is found.<br />
n 2006: O’Brien establishes a correlation between MT <strong>and</strong> TM<br />
segments <strong>in</strong> the 80-90 fuzzy match range.<br />
n Dangerous! Many IPMs still use this rate to pay for PE.<br />
n Translators generally resent do<strong>in</strong>g it....<br />
n .....but do it more <strong>and</strong> more often.
+ Post-edit<strong>in</strong>g MT: Productivity ga<strong>in</strong><br />
n In most studies, PEMT outperforms translat<strong>in</strong>g from scratch<br />
(productivity ga<strong>in</strong>s from 42 to 131%).<br />
n For some language-pairs, PEMT outperforms fuzzy matches.<br />
Source: MT at Autodesk, http://translate.autodesk.com
+ Post-edit<strong>in</strong>g MT: TM vs MT<br />
n Interest<strong>in</strong>g study by Guerberof, A. (2009)...<br />
n Compar<strong>in</strong>g productivity <strong>and</strong> quality on a group of 8<br />
professional translators, giv<strong>in</strong>g them New, TM <strong>and</strong> MT<br />
segments to translate<br />
n Results:<br />
n Translators have higher productivity <strong>and</strong> quality when us<strong>in</strong>g<br />
mach<strong>in</strong>e-translated output than when process<strong>in</strong>g fuzzy matches<br />
from TMs.<br />
n -> possible explanation: if the text is runn<strong>in</strong>g smoothly,<br />
translators overlook term<strong>in</strong>ology <strong>in</strong>consistencies etc. <strong>and</strong> do not<br />
consult the orig<strong>in</strong>al as often as they should.
+ MT <strong>and</strong> <strong>Professional</strong> <strong>Translation</strong><br />
n You are likely to be required to post-edit MT output <strong>and</strong><br />
produce translations of various quality levels.<br />
n “gist<strong>in</strong>g” vs. “raw translation” vs. “without grammatical errors” vs.<br />
“<strong>in</strong>dist<strong>in</strong>guishable from HT”<br />
n MT output needs gett<strong>in</strong>g used to...<br />
n Post-edit<strong>in</strong>g MT is not necessarily done by translators.<br />
n Post-edit<strong>in</strong>g effort must be measured <strong>and</strong> evaluated for each<br />
<strong>in</strong>dividual project!<br />
n You are likely to work with hybrid CAT/MT tools.<br />
n Edit<strong>in</strong>g a fuzzy match is not so much different from edit<strong>in</strong>g MT<br />
output. -> Typology of errors / corrections?<br />
n Doma<strong>in</strong>-adapted MT systems perform much better than Google<br />
Translate.
+ MT <strong>and</strong> <strong>Professional</strong> <strong>Translation</strong><br />
n Your students most likely already use Google Translate <strong>and</strong>/<br />
or other web services.<br />
n “If you can’t beat it, eat it!” Practice us<strong>in</strong>g MT as an <strong>in</strong>stant<br />
memory aid, discuss MT errors with students, play around with<br />
Google’s alternative translations <strong>and</strong> “shift-<strong>and</strong>-drag” reorder<strong>in</strong>g.
+ MT <strong>and</strong> <strong>Professional</strong> <strong>Translation</strong><br />
n Your students most likely already use Google Translate <strong>and</strong>/<br />
or other web services.<br />
n “If you can’t beat it, eat it!” Practice us<strong>in</strong>g MT as an <strong>in</strong>stant<br />
memory aid, discuss MT errors with students, play around with<br />
Google’s alternative translations <strong>and</strong> “shift-<strong>and</strong>-drag” reorder<strong>in</strong>g.<br />
n Discuss the lexical choices of MT systems <strong>and</strong> offer explanations<br />
for them.<br />
n Most MT systems are black box, but some <strong>in</strong>sight <strong>in</strong>to their<br />
strategies is useful.
+ MT <strong>and</strong> <strong>Professional</strong> <strong>Translation</strong><br />
n You most likely already use Google Translate <strong>and</strong>/or other<br />
web services.<br />
n “If you can’t beat it, eat it!” Practice us<strong>in</strong>g MT as an <strong>in</strong>stant<br />
memory aid, discuss MT errors with students, play around with<br />
Google’s alternative translations <strong>and</strong> “shift-<strong>and</strong>-drag” reorder<strong>in</strong>g.<br />
n Discuss the lexical choices of MT systems <strong>and</strong> offer explanations<br />
for them.<br />
n Most MT systems are black box, but some <strong>in</strong>sight <strong>in</strong>to their<br />
strategies is useful.<br />
n Compare the outputs of different MT systems, for different text<br />
types.<br />
n If feasible, compare language pairs; translat<strong>in</strong>g the same<br />
content from different source languages.<br />
n Play around with back-translation.
+ MT <strong>and</strong> <strong>Professional</strong> <strong>Translation</strong><br />
n You are not likely to lose your job because of MT.<br />
n ...but know<strong>in</strong>g, underst<strong>and</strong><strong>in</strong>g <strong>and</strong> participat<strong>in</strong>g <strong>in</strong> the<br />
development of MT systems is already a competitive advantage.
+ References<br />
n Weiss, Isabelle, 2011. <strong>Mach<strong>in</strong>e</strong> <strong>Translation</strong>. Myths, Anecdotes <strong>and</strong> Realities.<br />
http://www.translatorportalen.com/.../TD-2011-<strong>Mach<strong>in</strong>e</strong>_<strong>Translation</strong>_1.ppt<br />
n Guerberof, Ana. Productivity <strong>and</strong> quality <strong>in</strong> MT post-edit<strong>in</strong>g.<br />
http://www.mt-archive.<strong>in</strong>fo/MTS-2009-Guerberof.pdf<br />
n Eisele, Andreas, 2010. <strong>Mach<strong>in</strong>e</strong> <strong>Translation</strong> at the European Commision.<br />
Presentation at the conference “10 Jahre Verbmobil”.<br />
n The Future of Localization (2011). Lionbridge White Paper.<br />
http://en-us.lionbridge.com/translation-localization/knowledge-center/<br />
future-localization.htm<br />
n MT Archive, a comprehensive bibliography of MT <strong>and</strong> related discipl<strong>in</strong>es,<br />
http://www.mt-archive.<strong>in</strong>fo/<br />
n Statistical <strong>Mach<strong>in</strong>e</strong> Traslation, http://www.statmt.org/<br />
n EAMT, http://www.eamt.org/<strong>in</strong>dex.php<br />
n Hutch<strong>in</strong>s, J. W. <strong>and</strong> Somers, H. (1992) An <strong>in</strong>troduction to mach<strong>in</strong>e translation.<br />
London: Academic Press, http://www.hutch<strong>in</strong>sweb.me.uk/IntroMT-TOC.htm