12.07.2015 Views

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5. The real use of corpora <strong>in</strong> teach<strong>in</strong>g and research contexts 77guages is <strong>in</strong>creas<strong>in</strong>gly be<strong>in</strong>g solved by the creation of uni<strong>for</strong>m <strong>in</strong>terfaces, especially<strong>in</strong> translation tra<strong>in</strong><strong>in</strong>g <strong>in</strong>stitutions. Some examples of these are the Leeds <strong>in</strong>terface(at the Centre <strong>for</strong> <strong>Translation</strong> Studies of the University of Leeds), the UFREila plat<strong>for</strong>m (at the Université Diderot, Paris 7), the BancTrad and CUCWebcorpus <strong>in</strong>terfaces (at the Department of <strong>Translation</strong> and Philology of PompeuFabra University and Barcelona Media) or the OPUS <strong>in</strong>terface (Tiedemann & Nygaard2004). All these <strong>in</strong>terfaces allow access to more than one corpus and havebeen developed with the aim of serv<strong>in</strong>g as a uni<strong>for</strong>m plat<strong>for</strong>m to cover differentuser needs.At the Centre <strong>for</strong> <strong>Translation</strong> Studies <strong>in</strong> the University of Leeds, a corpus <strong>in</strong>terface13 has been developed that allows access to Internet corpora <strong>for</strong> Ch<strong>in</strong>ese,French, German, Italian, Spanish, Polish and Russian. This <strong>in</strong>terface allows querieswith search expressions that can conta<strong>in</strong> exact word <strong>for</strong>ms, lemmata, POS, substr<strong>in</strong>gsor unknown words. The <strong>in</strong>terface offers an option of simple queries ak<strong>in</strong>to Google that translate <strong>in</strong>to a corpus workbench query, e.g., a simple query termcorresponds to a lemma, while a term <strong>in</strong> double quotes corresponds to a word<strong>for</strong>m. However, queries comb<strong>in</strong><strong>in</strong>g POS and lemma restrictions must be writtenaccord<strong>in</strong>g to the Corpus Query Processor (CQP) syntax, which requires that theuser must be familiar with it. A similar requirement is made by the UFR Eila and bythe OPUS <strong>in</strong>terface. In the <strong>for</strong>mer, several comparable corpora (EN-FR) from specificsub doma<strong>in</strong>s (water, volcanoes, mounta<strong>in</strong>s, etc.) can be consulted by us<strong>in</strong>g thesyntax of regular expressions <strong>in</strong> Perl. And <strong>in</strong> the OPUS page a multil<strong>in</strong>gual concordancerus<strong>in</strong>g the CQP is available <strong>for</strong> most of the subcorpora. Currently, searchesby word, lemma and POS can be made <strong>in</strong> the “source” language. The two <strong>in</strong>terfacesdeveloped at UPF allocate several corpora. On the one hand, BancTrad 14 currentlyaccommodates 2 monol<strong>in</strong>gual corpora as well as the one multil<strong>in</strong>gual parallel corpusreferred to <strong>in</strong> Section 2 above. The two monol<strong>in</strong>gual corpora are the BNC <strong>for</strong>English and the ECI corpus Frankfurter Rundschau <strong>for</strong> German (about 34 millionwords of newspaper texts). On the other hand, CUCWeb 15 allocates the Catalancorpora CUCWeb and CTILC mentioned <strong>in</strong> Section 2 above. What basically dist<strong>in</strong>guishesthe UPF <strong>in</strong>terfaces is the fact that they are user-friendly <strong>in</strong>terfaces thatcan be used by both non-tra<strong>in</strong>ed and more experienced users as no knowledge ofany query syntax is required. The simple mode query allows searches <strong>for</strong> words,lemmata or word str<strong>in</strong>gs and can be used by any untra<strong>in</strong>ed user. Expert mode allowsqueries of str<strong>in</strong>gs of up to 5 word units, where each unit can be a word <strong>for</strong>m,lemma, part-of-speech, syntactic function or comb<strong>in</strong>ation of any of those. The fact13. http://corpus.leeds.ac.uk/<strong>in</strong>ternet.html14. http://mutis.upf.es/bt/english/<strong>in</strong>dex.htm15. http://ramsesii.upf.es/cgi-b<strong>in</strong>/CUCWeb/search-<strong>for</strong>m.pl

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!