12.07.2015 Views

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

28 Silvia Hansen-Schirracurrently annotated manually with the help of MMAX II (Müller & Strube 2003),a tool allow<strong>in</strong>g assignment of self-def<strong>in</strong>ed categories and l<strong>in</strong>k<strong>in</strong>g units.Concern<strong>in</strong>g the alignment of the texts, we do not only align sentences (whichis state of the art <strong>in</strong> <strong>Translation</strong> Memories; e.g., Johansson et al. 1996) and words(which is state of the art <strong>in</strong> Mach<strong>in</strong>e <strong>Translation</strong>; cf. Och & Ney 2003) but alsoclauses. Word alignment is realised with GIZA++ (Och & Ney 2003), a statisticalalignment tool. Clauses are aligned manually with the help of MMAX II(see above). Sentences are aligned us<strong>in</strong>g W<strong>in</strong>-Align, an alignment tool with<strong>in</strong> theTranslator’s Workbench by Trados (Heyn 1996). Additionally, phrase alignmentcan be derived from word alignment <strong>in</strong> comb<strong>in</strong>ation with the phrase chunk<strong>in</strong>gand syntactic functions can be mapped automatically across the parallel corpus.Each annotation and alignment layer is stored separately <strong>in</strong> a multi-layer stand-offXML representation <strong>for</strong>mat keep<strong>in</strong>g the annotation and alignment of overlapp<strong>in</strong>gand/or discont<strong>in</strong>uous units <strong>in</strong> separate files. The mark-up builds on theXCES Standard.The architecture of the CroCo Corpus allows us to view the annotation <strong>in</strong>aligned segments and to pose queries comb<strong>in</strong><strong>in</strong>g different layers (Hansen-Schirraet al. 2006). The resource thus permits the analysis of a wealth of l<strong>in</strong>guistic <strong>in</strong><strong>for</strong>mationon each level help<strong>in</strong>g us to understand the <strong>in</strong>terplay of the different levelsand the relationship of lower level features to more abstract concepts. For parallelconcordanc<strong>in</strong>g, query tools such as the IMS Corpus Workbench (Christ 1994) canbe employed. Its corpus query processor (CQP) allows queries <strong>for</strong> words and/orannotation tags on the basis of regular expressions. For more complex queries,the annotated data is converted <strong>in</strong>to a MySQL database. On this basis, an effectiveexploitation of different annotation and alignment layers is guaranteed. Inthe follow<strong>in</strong>g, we will demonstrate how the bil<strong>in</strong>gual CroCo Corpus is used toextract parallel grammatical structures help<strong>in</strong>g translators to decide on typicalEnglish-German translation problems.3. Solv<strong>in</strong>g translation problems with treebanksIn many cases, typological differences between languages can be translatedstraight<strong>for</strong>wardly without any problems. Different grammatical morphologies are,<strong>for</strong> <strong>in</strong>stance, not considered as major translation problems. There are, however,typological differences that are problematic <strong>for</strong> the translation process. Typically,these are constructions which exist <strong>in</strong> one language but do not exist or are rarelyused <strong>in</strong> the other. For the translation of such constructions, the translator has tocompensate them <strong>in</strong> the target language. It is, however, not always easy to f<strong>in</strong>dan adequate translation equivalent. For this reason, a language resource <strong>in</strong>clud<strong>in</strong>ggrammatical descriptions of translation pairs can help to solve translation

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!