12.07.2015 Views

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 10. Standardis<strong>in</strong>g multil<strong>in</strong>gual data 155systems, and translation memories. However, the EAGLES group does not aim atproduc<strong>in</strong>g <strong>in</strong>ternational standards but rather to present the needs and the requirementsof operational applications and to accelerate the process of standardisation<strong>in</strong> this matter.There are no specific standards <strong>for</strong> automatic mach<strong>in</strong>e translation either.However, CAT systems were subjected to many evaluations thus mak<strong>in</strong>g it possibleto gradually improve the methodologies used <strong>for</strong> these evaluations. This is thereason why some of these evaluations can be considered as be<strong>in</strong>g de facto standards<strong>for</strong> the future evaluation of CAT technologies.On its side, the LISA organisation proposes a recommendation called <strong>Translation</strong>Memory Exchange (TMX) that aims at facilitat<strong>in</strong>g the exchange of the datarelated to translation memories between tools and software CAT systems. AlthoughTM Tools are based on the same basic idea, we must note that <strong>for</strong> thesame sentence each tool proposes rather different ways to implement the required<strong>for</strong>matt<strong>in</strong>g <strong>in</strong><strong>for</strong>mation: on the one hand, <strong>for</strong>matt<strong>in</strong>g is applied to the source andtarget texts of a translation unit and this <strong>for</strong>matt<strong>in</strong>g is not exported to the correspond<strong>in</strong>gTMX file; on the other hand, <strong>for</strong>matt<strong>in</strong>g is sometimes exported to theTMX file. In the follow<strong>in</strong>g table (see Table 1), the sample sentence “the sentenceconta<strong>in</strong>s different <strong>for</strong>matt<strong>in</strong>g <strong>in</strong><strong>for</strong>mation” is represented <strong>in</strong> TMX by us<strong>in</strong>g severaltools (Zerfaß 2005). Some of these tools use external files to store <strong>for</strong>matt<strong>in</strong>g <strong>in</strong><strong>for</strong>mation(i.e., Déjà Vu and SDLX), but all of them use different ways of encod<strong>in</strong>gthat <strong>in</strong><strong>for</strong>mation.Table 1. Comparison of <strong>for</strong>matt<strong>in</strong>g across toolsTRADOS 6.5 DÉJÀ VU SDLXThis {\bsentence}conta<strong>in</strong>s{\idifferent}{\ul<strong>for</strong>matt<strong>in</strong>g<strong>in</strong><strong>for</strong>mation}.{1}This{2}sentence{3}conta<strong>in</strong>s{4}different{5}{6}<strong>for</strong>matt<strong>in</strong>g<strong>in</strong><strong>for</strong>mation{7}.This&lt;1&gt;sentence&lt;/1&gt;conta<strong>in</strong>s&lt;2&gt;different &lt;/2&gt;&lt;3&gt;<strong>for</strong>matt<strong>in</strong>g<strong>in</strong><strong>for</strong>mation&lt;/3&gt;.In addition, the segmentation rules used by TM tools are not compatible:each tool applies its own rule to split the text <strong>in</strong>to various segments. In a samesentence some tools consider various separators. For example the semi-colon is

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!