12.07.2015 Views

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 9. BEYTrans 141lators to check the evolution of a translation and avoid los<strong>in</strong>g content. Translatorscan easily restore old translations deleted erroneously or by vandals.With<strong>in</strong> this context, it was a natural decision <strong>for</strong> us to use Wiki as the basisof BEYTrans. After analys<strong>in</strong>g several Wiki implementations and <strong>Translation</strong>wiki,and experiment<strong>in</strong>g with some of the available Wikis, we chose XWiki <strong>for</strong> develop<strong>in</strong>gour system (XWiki 2007). It is a Java-based environment which allows easy<strong>in</strong>tegration of exist<strong>in</strong>g Java tools <strong>for</strong> NLP process<strong>in</strong>g.5. <strong>Language</strong> resources managementIn our environment, the language resources and TMs are pre-processed and manageddifferently. What we understand under language resources (dictionaries, glossaries,technical term<strong>in</strong>ologies, etc.) are imported as raw textual data and trans<strong>for</strong>med<strong>in</strong>to a structured <strong>for</strong>mat. In our work to date, we have imported morethan 1.7 million entries <strong>in</strong> Arabic, English, French, and Japanese <strong>in</strong> XML <strong>for</strong>mat.To be useful, a TM (a set of textual translation units extracted from exist<strong>in</strong>g translateddocuments or dur<strong>in</strong>g translation) has to match the typology and doma<strong>in</strong> ofthe documents to be translated. Hence, each translation community has to use itsownTM.Aswecouldnotf<strong>in</strong>danyready-to-useTMs<strong>in</strong>suchcommunities,wecreated a small TM from exist<strong>in</strong>g documents translated by the W3C community.In the follow<strong>in</strong>g sub-sections, the language resources will be described and theXLD (XML <strong>Language</strong> Data) <strong>for</strong>mat will be <strong>in</strong>troduced. Our management of TM<strong>in</strong> BEYTrans will also be clearly expla<strong>in</strong>ed with reference to TMX-C, which hasbeen adapted from the TMX standard (LISA 2007). The description <strong>in</strong> this sectionis slightly technical, because at this level the actual treatment of resources and thetechnical aspects that support them are <strong>in</strong>separable.5.1 <strong>Language</strong> resourcesExist<strong>in</strong>g language resources that we have imported to the structure and that are <strong>in</strong>actual use <strong>in</strong>clude “Eijiro” and “Grand Concise”, two high-quality English-Japaneseunidirectional dictionaries widely used by many translators, “Nichigai”, which coversproper names, “Medical Scientific Terms”, a medical dictionary <strong>in</strong>cluded toallow us to check the structure of term<strong>in</strong>ological dictionaries (Bey et al. 2006a),and “Edict”, a free Japanese-English dictionary (Table 1).As our environment is open to all languages, it allows <strong>for</strong> the importation ofother dictionaries, subject to two restrictions: (i) the dictionary must be structured<strong>in</strong> XLD; (ii) its data must be encoded <strong>in</strong> UTF-8. For example, the ArabicMozilla (Arabic Mozilla 2007) translation community has a free dictionary created

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!