Designing a Machine Translation System for Canadian Weather ...
Designing a Machine Translation System for Canadian Weather ...
Designing a Machine Translation System for Canadian Weather ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Designing</strong> a <strong>Weather</strong> Warning <strong>Translation</strong> <strong>System</strong> 2<br />
transcripts proved not to be appropriate. Instead, we had to harvest parallel text<br />
from sometimes poorly maintained archives. To further complicate the matter, their<br />
bulk was in a <strong>for</strong>m that could not be readily exploited, since properly-cased letters<br />
and diacritics were missing. In contrast, our goal was to produce translations as<br />
close as possible to a properly <strong>for</strong>matted text. Although the exact processes we<br />
describe are specific to our application, we are convinced that any productionquality<br />
SMT system will have to go through similar unglamorous but nevertheless<br />
essential procedures.<br />
This paper emphasizes the need <strong>for</strong> the integration of many sources of in<strong>for</strong>mation<br />
in order to produce a high quality translated text. In this project, we made<br />
good use of our long experience in the area of processing weather in<strong>for</strong>mation and<br />
we benefited greatly from our close association with the people at Environment<br />
Canada. They kindly agreed to offer rich sources of meteorological in<strong>for</strong>mation and<br />
they provided continuous feedback on the state of our prototypes. Indeed, too often<br />
MT projects rely solely on automatic evaluation methods such as BLEU scores, but<br />
the real test is human evaluation. It is costly but, as we show, these statistics can<br />
bring a much better insight into the development of a production quality system.<br />
Any “real” SMT system is much more than a mere translation engine; it features<br />
modules to gather source text, to preprocess it, to keep the output in a translation<br />
memory and to produce <strong>for</strong>matted text appropriate <strong>for</strong> dissemination. This paper<br />
describes in detail the intricacies of such a system, not often described but nevertheless<br />
essential in a production setting. At each step, we will present the choices<br />
that have been made through a patient examination of the source texts, the translation<br />
models and the outputs. Even though we are well aware that we did not<br />
always consider all alternatives and sometimes had to opt <strong>for</strong> simplicity because of<br />
time and resource constraints, we are confident that the resulting system is a good<br />
language engineering use-case, given our experience with these corpora.<br />
2 Context of the Application<br />
In the early 1970s, a research group at the Université de Montréal called TAUM<br />
completed a machine translation project called TAUM-MÉTÉO, which has been<br />
called the most successful case <strong>for</strong> this type of technology (Mitkov (2005), p. 439).<br />
This system was developed <strong>for</strong> translating weather <strong>for</strong>ecasts issued by Environment<br />
Canada from English to French, a problem that lent itself well to automation,<br />
given the highly repetitive nature of the text and the tediousness of this specific<br />
translation task <strong>for</strong> humans. TAUM-MÉTÉO relied on dictionary lookup and syntactic<br />
analysis, followed by simple syntactic and morphological generation rules. An<br />
overview of the system can be found in (Isabelle 1987).<br />
TAUM-MÉTÉO and a number of successors (see Chandioux (1988)) were deployed<br />
at Environment Canada. From 1984 to 2004, they have continually translated<br />
English weather <strong>for</strong>ecasts. <strong>Translation</strong> professionals from the <strong>Canadian</strong> <strong>Translation</strong><br />
Bureau supervised the process and made sure that the occasional spelling<br />
error or other difficulty found in the source text did not prevent the production<br />
of a French <strong>for</strong>ecast. The quality of TAUM-MÉTÉO’s output is considered very