12.07.2015 Views

Training Phrase Translation Models with Leaving-One-Out - Quaero

Training Phrase Translation Models with Leaving-One-Out - Quaero

Training Phrase Translation Models with Leaving-One-Out - Quaero

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Table 2: Statistics for the Europarl German-English dataGermanEnglishTRAIN Sentences 1 311 815Run. Words 34 398 651 36 090 085Vocabulary 336 347 118 112Singletons 168 686 47 507DEV Sentences 2 000Run. Words 55 118 58 761Vocabulary 9 211 6 549OOVs 284 77TEST Sentences 2 000Run. Words 56 635 60 188Vocabulary 9 254 6 497OOVs 266 89show a slightly lower performance. This illustratesthat a higher number of features results in a lessreliable optimization of the log-linear parameters.5 Experimental Evaluation5.1 Experimental SetupWe conducted our experiments on the German-English data published for the ACL 2008Workshop on Statistical Machine <strong>Translation</strong>(WMT08). Statistics for the Europarl data aregiven in Table 2.We are given the three data sets T RAIN, DEVand T EST . For the heuristic phrase model, wefirst use GIZA++ (Och and Ney, 2003) to computethe word alignment on T RAIN. Next we obtaina phrase table by extraction of phrases from theword alignment. The scaling factors of the translationmodels have been optimized for BLEU onthe DEV data.The phrase table obtained by heuristic extractionis also used to initialize the training. The forcedalignment is run on the training data T RAINfrom which we obtain the phrase alignments.Those are used to build a phrase table accordingto the proposed generative phrase models. Afterward,the scaling factors are trained on DEV forthe new phrase table. By feeding back the newphrase table into forced alignment we can reiteratethe training procedure. When training is finishedthe resulting phrase model is evaluated on DEVTable 3: Comparison of different training setupsfor the count model on DEV .leaving-one-out max phr.len. BLEU TERbaseline 6 25.7 61.1none 2 25.2 61.33 25.7 61.34 25.5 61.45 25.5 61.46 25.4 61.7standard 6 26.4 60.9length-based 6 26.5 60.6and T EST . Additionally, we can apply smoothingby interpolation of the new phrase table <strong>with</strong>the original one estimated heuristically, retrain thescaling factors and evaluate afterwards.The baseline system is a standard phrase-basedSMT system <strong>with</strong> eight features: phrase translationand word lexicon probabilities in both translationdirections, phrase penalty, word penalty, languagemodel score and a simple distance-based reorderingmodel. The features are combined in alog-linear way. To investigate the generative models,we replace the two phrase translation probabilitiesand keep the other features identical tothe baseline. For the feature-wise combinationthe two generative phrase probabilities are addedto the features, resulting in a total of 10 features.We used a 4-gram language model <strong>with</strong> modifiedKneser-Ney discounting for all experiments. Themetrics used for evaluation are the case-sensitiveBLEU (Papineni et al., 2002) score and the translationedit rate (TER) (Snover et al., 2006) <strong>with</strong>one reference translation.5.2 ResultsIn this section, we investigate the different aspectsof the models and methods presented before.We will focus on the proposed leaving-oneouttechnique and show that it helps in findinggood phrasal alignments on the training data thatlead to improved translation models. Our finalresults show an improvement of 1.4 BLEU overthe heuristically extracted phrase model on the testdata set.In Section 3.2 we have discussed several methodswhich aim to overcome the over-fitting prob-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!