22.11.2014 Views

Designing a Machine Translation System for Canadian Weather ...

Designing a Machine Translation System for Canadian Weather ...

Designing a Machine Translation System for Canadian Weather ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Designing</strong> a <strong>Weather</strong> Warning <strong>Translation</strong> <strong>System</strong> 2<br />

transcripts proved not to be appropriate. Instead, we had to harvest parallel text<br />

from sometimes poorly maintained archives. To further complicate the matter, their<br />

bulk was in a <strong>for</strong>m that could not be readily exploited, since properly-cased letters<br />

and diacritics were missing. In contrast, our goal was to produce translations as<br />

close as possible to a properly <strong>for</strong>matted text. Although the exact processes we<br />

describe are specific to our application, we are convinced that any productionquality<br />

SMT system will have to go through similar unglamorous but nevertheless<br />

essential procedures.<br />

This paper emphasizes the need <strong>for</strong> the integration of many sources of in<strong>for</strong>mation<br />

in order to produce a high quality translated text. In this project, we made<br />

good use of our long experience in the area of processing weather in<strong>for</strong>mation and<br />

we benefited greatly from our close association with the people at Environment<br />

Canada. They kindly agreed to offer rich sources of meteorological in<strong>for</strong>mation and<br />

they provided continuous feedback on the state of our prototypes. Indeed, too often<br />

MT projects rely solely on automatic evaluation methods such as BLEU scores, but<br />

the real test is human evaluation. It is costly but, as we show, these statistics can<br />

bring a much better insight into the development of a production quality system.<br />

Any “real” SMT system is much more than a mere translation engine; it features<br />

modules to gather source text, to preprocess it, to keep the output in a translation<br />

memory and to produce <strong>for</strong>matted text appropriate <strong>for</strong> dissemination. This paper<br />

describes in detail the intricacies of such a system, not often described but nevertheless<br />

essential in a production setting. At each step, we will present the choices<br />

that have been made through a patient examination of the source texts, the translation<br />

models and the outputs. Even though we are well aware that we did not<br />

always consider all alternatives and sometimes had to opt <strong>for</strong> simplicity because of<br />

time and resource constraints, we are confident that the resulting system is a good<br />

language engineering use-case, given our experience with these corpora.<br />

2 Context of the Application<br />

In the early 1970s, a research group at the Université de Montréal called TAUM<br />

completed a machine translation project called TAUM-MÉTÉO, which has been<br />

called the most successful case <strong>for</strong> this type of technology (Mitkov (2005), p. 439).<br />

This system was developed <strong>for</strong> translating weather <strong>for</strong>ecasts issued by Environment<br />

Canada from English to French, a problem that lent itself well to automation,<br />

given the highly repetitive nature of the text and the tediousness of this specific<br />

translation task <strong>for</strong> humans. TAUM-MÉTÉO relied on dictionary lookup and syntactic<br />

analysis, followed by simple syntactic and morphological generation rules. An<br />

overview of the system can be found in (Isabelle 1987).<br />

TAUM-MÉTÉO and a number of successors (see Chandioux (1988)) were deployed<br />

at Environment Canada. From 1984 to 2004, they have continually translated<br />

English weather <strong>for</strong>ecasts. <strong>Translation</strong> professionals from the <strong>Canadian</strong> <strong>Translation</strong><br />

Bureau supervised the process and made sure that the occasional spelling<br />

error or other difficulty found in the source text did not prevent the production<br />

of a French <strong>for</strong>ecast. The quality of TAUM-MÉTÉO’s output is considered very

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!