27.02.2014 Views

Proceedings of the Workshop on Discourse in Machine Translation

Proceedings of the Workshop on Discourse in Machine Translation

Proceedings of the Workshop on Discourse in Machine Translation

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

German–English<br />

English–Swedish<br />

Type Sentences Documents Type Sentences Documents<br />

Tra<strong>in</strong><strong>in</strong>g<br />

Europarl 1.9M – Europarl 1.5M –<br />

News Commentary 178K – – – –<br />

Tun<strong>in</strong>g<br />

News2009 2525 111 Europarl (Moses) 2000 –<br />

News2008-2010 7567 345 Europarl (Docent) 1338 100<br />

Test News2012 3003 99 Europarl 690 20<br />

Table 1: Doma<strong>in</strong> and number <str<strong>on</strong>g>of</str<strong>on</strong>g> sentences and documents for <str<strong>on</strong>g>the</str<strong>on</strong>g> corpora<br />

As seen <strong>in</strong> Figure 1, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are some additi<strong>on</strong>al<br />

parameters <strong>in</strong> our procedure: <str<strong>on</strong>g>the</str<strong>on</strong>g> sample start iterati<strong>on</strong><br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> sample <strong>in</strong>terval. We also need to set<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> decoder iterati<strong>on</strong>s to run. In Secti<strong>on</strong><br />

5 we empirically <strong>in</strong>vestigate <str<strong>on</strong>g>the</str<strong>on</strong>g> effect <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se<br />

parameters.<br />

Compared to sentence-level optimizati<strong>on</strong>, we<br />

also have a smaller number <str<strong>on</strong>g>of</str<strong>on</strong>g> units to get scores<br />

from, s<strong>in</strong>ce we use documents as units, and not<br />

sentences. The importance <str<strong>on</strong>g>of</str<strong>on</strong>g> this depends <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

optimizati<strong>on</strong> algorithm. MERT calculates metric<br />

scores over <str<strong>on</strong>g>the</str<strong>on</strong>g> full tun<strong>in</strong>g set, not for <strong>in</strong>dividual<br />

sentences, and should not be affected too much<br />

by <str<strong>on</strong>g>the</str<strong>on</strong>g> change <strong>in</strong> granularity. Many o<str<strong>on</strong>g>the</str<strong>on</strong>g>r optimizati<strong>on</strong><br />

algorithms, like PRO, work <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> sentence<br />

level, and will likely be more affected by<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> reducti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> units. In this work we focus <strong>on</strong><br />

MERT, which is <str<strong>on</strong>g>the</str<strong>on</strong>g> most comm<strong>on</strong>ly used optimizati<strong>on</strong><br />

procedure <strong>in</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> SMT community, and<br />

which tends to work quite well with relatively few<br />

features. However, we also show c<strong>on</strong>trastive results<br />

for PRO (Hopk<strong>in</strong>s and May, 2011). A fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

issue is that Docent is n<strong>on</strong>-determ<strong>in</strong>istic, i.e.,<br />

it can give different results with <str<strong>on</strong>g>the</str<strong>on</strong>g> same parameter<br />

weights. S<strong>in</strong>ce <str<strong>on</strong>g>the</str<strong>on</strong>g> optimizati<strong>on</strong> process is already<br />

somewhat unstable this is a potential issue<br />

that needs to be explored fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r, which we do <strong>in</strong><br />

Secti<strong>on</strong> 5.<br />

Implementati<strong>on</strong>-wise we adapted Docent to output<br />

k-lists and adapted <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>in</strong>frastructure available<br />

for tun<strong>in</strong>g <strong>in</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Moses decoder (Koehn et al.,<br />

2007) to work with document-level scores. This<br />

setup allows us to use <str<strong>on</strong>g>the</str<strong>on</strong>g> variety <str<strong>on</strong>g>of</str<strong>on</strong>g> optimizati<strong>on</strong><br />

procedures implemented <str<strong>on</strong>g>the</str<strong>on</strong>g>re.<br />

5 Experiments<br />

In this secti<strong>on</strong> we report experimental results<br />

where we <strong>in</strong>vestigate several issues <strong>in</strong> c<strong>on</strong>necti<strong>on</strong><br />

with document-level feature weight optimizati<strong>on</strong><br />

for SMT. We first describe <str<strong>on</strong>g>the</str<strong>on</strong>g> experimental<br />

setup, followed by basel<strong>in</strong>e results us<strong>in</strong>g sentencelevel<br />

optimizati<strong>on</strong>. We <str<strong>on</strong>g>the</str<strong>on</strong>g>n present validati<strong>on</strong> experiments<br />

with standard sentence-level features,<br />

which can be compared to standard optimizati<strong>on</strong>.<br />

F<strong>in</strong>ally, we report results with a set <str<strong>on</strong>g>of</str<strong>on</strong>g> documentlevel<br />

features that have been proposed for jo<strong>in</strong>t<br />

translati<strong>on</strong> and text simplificati<strong>on</strong> (Stymne et al.,<br />

2013).<br />

5.1 Experimental Setup<br />

Most <str<strong>on</strong>g>of</str<strong>on</strong>g> our experiments are for German-to-<br />

English news translati<strong>on</strong> us<strong>in</strong>g data from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

WMT13 workshop. 1 We also show results with<br />

document-level features for English-to-Swedish<br />

Europarl (Koehn, 2005). The size <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tra<strong>in</strong><strong>in</strong>g,<br />

tun<strong>in</strong>g, and test sets are shown <strong>in</strong> Table 1. First <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

all, we need to extract documents for tun<strong>in</strong>g and<br />

test<strong>in</strong>g with Docent. Fortunately, <str<strong>on</strong>g>the</str<strong>on</strong>g> news data already<br />

c<strong>on</strong>ta<strong>in</strong> document markup, corresp<strong>on</strong>d<strong>in</strong>g to<br />

<strong>in</strong>dividual news articles. For Europarl we def<strong>in</strong>e a<br />

document as a c<strong>on</strong>secutive sequence <str<strong>on</strong>g>of</str<strong>on</strong>g> utterances<br />

from a s<strong>in</strong>gle speaker. To <strong>in</strong>vestigate <str<strong>on</strong>g>the</str<strong>on</strong>g> effect <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> size <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tun<strong>in</strong>g set, we used different subsets<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> available tun<strong>in</strong>g data.<br />

All our document-level experiments are carried<br />

out with Docent but we also c<strong>on</strong>trast with<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> Moses decoder (Koehn et al., 2007). For <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> comparis<strong>on</strong>, we use a standard set <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

sentence-level features used <strong>in</strong> Moses <strong>in</strong> most <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

our experiments: five translati<strong>on</strong> model features,<br />

<strong>on</strong>e language model feature, a distance-based reorder<strong>in</strong>g<br />

penalty, and a word count feature. For<br />

feature weight optimizati<strong>on</strong> we also apply <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

standard sett<strong>in</strong>gs <strong>in</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Moses toolkit. We optimize<br />

towards <str<strong>on</strong>g>the</str<strong>on</strong>g> Bleu metric, and optimizati<strong>on</strong><br />

ends ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r when no weights are changed by more<br />

than 0.00001, or after 25 iterati<strong>on</strong>s. MERT is used<br />

unless o<str<strong>on</strong>g>the</str<strong>on</strong>g>rwise noted.<br />

Except for <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> our basel<strong>in</strong>es, we always run<br />

Docent with random <strong>in</strong>itializati<strong>on</strong>. For test we run<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> document decoder for a maximum <str<strong>on</strong>g>of</str<strong>on</strong>g> 2 27 iterati<strong>on</strong>s<br />

with a rejecti<strong>on</strong> limit <str<strong>on</strong>g>of</str<strong>on</strong>g> 100,000. In our<br />

experiments, <str<strong>on</strong>g>the</str<strong>on</strong>g> decoder always stopped when<br />

reach<strong>in</strong>g <str<strong>on</strong>g>the</str<strong>on</strong>g> rejecti<strong>on</strong> limit, usually between 1–5<br />

1 http://www.statmt.org/wmt13/<br />

translati<strong>on</strong>-task.html<br />

63

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!