27.02.2014 Views

Proceedings of the Workshop on Discourse in Machine Translation

Proceedings of the Workshop on Discourse in Machine Translation

Proceedings of the Workshop on Discourse in Machine Translation

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Introducti<strong>on</strong><br />

It is a truism that texts have properties that go bey<strong>on</strong>d those <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir <strong>in</strong>dividual sentences, <strong>in</strong>clud<strong>in</strong>g:<br />

• document-wide properties, such as topic mix, style, register, read<strong>in</strong>g level and genre, all <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

which are manifest <strong>in</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> frequency and distributi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> words, word senses, referential forms<br />

and syntactic structures;<br />

• patterns <str<strong>on</strong>g>of</str<strong>on</strong>g> topical or functi<strong>on</strong>al sub-structure that show up <strong>in</strong> localized differences <strong>in</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> frequency<br />

and distributi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se elements with<strong>in</strong> documents;<br />

• patterns <str<strong>on</strong>g>of</str<strong>on</strong>g> discourse coherence, manifest <strong>in</strong> explicit and implicit relati<strong>on</strong>s between sentences<br />

(clauses), or between sentences (clauses) and referr<strong>in</strong>g forms, or between referr<strong>in</strong>g forms<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>mselves;<br />

• comm<strong>on</strong> use <str<strong>on</strong>g>of</str<strong>on</strong>g> reduced expressi<strong>on</strong>s that rely <strong>on</strong> c<strong>on</strong>text to c<strong>on</strong>vey a lot <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>in</strong>formati<strong>on</strong> <strong>in</strong> very few<br />

words.<br />

These properties stimulated a good deal <str<strong>on</strong>g>of</str<strong>on</strong>g> Mach<strong>in</strong>e Translati<strong>on</strong> research <strong>in</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> 1990s, aimed at endow<strong>in</strong>g<br />

mach<strong>in</strong>e–translated target texts with <str<strong>on</strong>g>the</str<strong>on</strong>g> same document and discourse properties as <str<strong>on</strong>g>the</str<strong>on</strong>g>ir source texts,<br />

albeit realized differently <strong>in</strong> source and target languages. This <strong>in</strong>cluded work <strong>on</strong> stylistics for Mach<strong>in</strong>e<br />

Translati<strong>on</strong> (DiMarco & Mah 1994), target language realizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> source-language discourse relati<strong>on</strong>s<br />

(Mitkov 1993) and <str<strong>on</strong>g>of</str<strong>on</strong>g> referr<strong>in</strong>g forms (B<strong>on</strong>d & Ogura 1998; More et al. 1999; Wada 1990), anaphora<br />

resoluti<strong>on</strong> for generat<strong>in</strong>g appropriate target-language pr<strong>on</strong>ouns (Chan and T’sou 1999; Ferrández et al.<br />

1999; Nakaiwa & Ikehara 1992; Nakaiwa 1999), and ellipsis resoluti<strong>on</strong> for generat<strong>in</strong>g appropriate targetlanguage<br />

forms from ellipsed verb-phrases (Balkan 1998). Po<strong>in</strong>ters to much <str<strong>on</strong>g>of</str<strong>on</strong>g> this work can be found<br />

<strong>in</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Mach<strong>in</strong>e Translati<strong>on</strong> Archive <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>ference and workshop papers from <str<strong>on</strong>g>the</str<strong>on</strong>g> 1990s (see www.mtarchive.<strong>in</strong>fo/srch/l<strong>in</strong>g-90.htm).<br />

This early period essentially ended with <str<strong>on</strong>g>the</str<strong>on</strong>g> 1999 publicati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a special issue <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> journal Mach<strong>in</strong>e<br />

Translati<strong>on</strong>, edited by Ruslan Mitkov, devoted to anaphora resoluti<strong>on</strong> <strong>in</strong> Mach<strong>in</strong>e Translati<strong>on</strong> and multil<strong>in</strong>gual<br />

NLP. Only <strong>in</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> past 3–4 years has <str<strong>on</strong>g>the</str<strong>on</strong>g>re been renewed <strong>in</strong>terest <strong>in</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>se topics, now from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

perspectives <str<strong>on</strong>g>of</str<strong>on</strong>g> Statistical Mach<strong>in</strong>e Translati<strong>on</strong> and Hybrid Mach<strong>in</strong>e Translati<strong>on</strong> (Chung & Gildea 2010;<br />

Eidelman et al. 2012; Foster et al. 2012; G<strong>on</strong>g et al. 2011; Guillou 2012; Hardmeier & Federico 2010;<br />

Hardmeier et al. 2012; Le Nagard & Koehn 2010; Meyer 2012; Meyer et al. 2012; Voigt & Jurafsky<br />

2012).<br />

With this renewed <strong>in</strong>terest, this ACL <str<strong>on</strong>g>Workshop</str<strong>on</strong>g> <strong>on</strong> <strong>Discourse</strong> <strong>in</strong> Mach<strong>in</strong>e Translati<strong>on</strong> provides a timely<br />

forum for <str<strong>on</strong>g>the</str<strong>on</strong>g> presentati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> new approaches to enabl<strong>in</strong>g modern systems to produce texts that are not<br />

merely sequences <str<strong>on</strong>g>of</str<strong>on</strong>g> isolated sentences.<br />

Eight submissi<strong>on</strong>s have been accepted for <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>Workshop</str<strong>on</strong>g>, <strong>on</strong> topics that range from multil<strong>in</strong>gual model<strong>in</strong>g<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> discourse for mach<strong>in</strong>e translati<strong>on</strong>, to actual use <str<strong>on</strong>g>of</str<strong>on</strong>g> discourse-level features to improve mach<strong>in</strong>e<br />

translati<strong>on</strong>. From <str<strong>on</strong>g>the</str<strong>on</strong>g> model<strong>in</strong>g perspective, <str<strong>on</strong>g>the</str<strong>on</strong>g> papers presented at <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>Workshop</str<strong>on</strong>g> discuss discourse<br />

phenomena such as lexical c<strong>on</strong>sistency (Guillou, this volume), lexical cohesi<strong>on</strong> (Beigman Klebanov<br />

& Flor, this volume) and implicit c<strong>on</strong>nectives (Meyer & Webber, this volume), and “mean<strong>in</strong>g units”<br />

with cognitive relevance (Williams et al., this volume). From <str<strong>on</strong>g>the</str<strong>on</strong>g> perspective <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong> to MT,<br />

several papers present encourag<strong>in</strong>g results show<strong>in</strong>g that discourse-related features br<strong>in</strong>g measurable<br />

improvements to <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> mach<strong>in</strong>e-translated texts. One study uses oracle features, namely<br />

c<strong>on</strong>nective labels (Meyer & Poláková, this volume), while o<str<strong>on</strong>g>the</str<strong>on</strong>g>rs use automatically-assigned <strong>on</strong>es. For<br />

<strong>in</strong>stance, <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> tensed verbs is improved by recogniz<strong>in</strong>g whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r or not <str<strong>on</strong>g>the</str<strong>on</strong>g>y are c<strong>on</strong>vey<strong>in</strong>g<br />

iii

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!