12.07.2015 Views

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

74 Carme Colom<strong>in</strong>as and Toni Badialayout, but <strong>in</strong> the types of queries they allow <strong>for</strong>, and this even affects the exploitationpossibilities and especially those that imply compar<strong>in</strong>g the results obta<strong>in</strong>edfrom several corpora. For <strong>in</strong>stance, it is quite difficult to compare the usage of e.g.,ES/CA molar as verb (<strong>for</strong> ‘to be great’, ‘amaz<strong>in</strong>g’, ‘cool’, etc.) <strong>in</strong> the jargon of theyoung Catalan and Spanish, as the available corpus <strong>in</strong> one language (CUCWeb <strong>for</strong>Catalan searches) allows <strong>for</strong> searches by lemma, whereas the one available <strong>in</strong> theother (CREA) does not. A similar problem arises when we try to compare patternsof use of a verb like like <strong>in</strong> the BNC and mögen <strong>in</strong> the German IDS corpus. Despitebe<strong>in</strong>g one of the best available reference corpora, the BNC is not lemmatised,which considerably restricts its potential use and the possibilities of per<strong>for</strong>m<strong>in</strong>gthis k<strong>in</strong>d of comparison with other languages <strong>for</strong> which a lemmatised corpus isavailable. In other words, the range of functionality <strong>for</strong> automated retrieval ofcorpora is greatly dependent on annotation, and differences between corpora <strong>in</strong>this matter limit their potential usage considerably. Besides annotation, corporadiffer from each other depend<strong>in</strong>g on the query language used. Compare, <strong>for</strong> example,the different query syntaxes by us<strong>in</strong>g Xaira (to access the BNC) or CorpusWorkbench. Tak<strong>in</strong>g <strong>in</strong>to account that translation students and researchers workcommonly with at least three or four different languages, they need to access constantlyseveral URLs <strong>in</strong> order to get familiar with different <strong>in</strong>terfaces and querylanguages and, what is worse, to face the differences <strong>in</strong> creat<strong>in</strong>g concordances (by<strong>for</strong>m, lemma or part-of-speech (POS)), <strong>in</strong> gather<strong>in</strong>g statistical <strong>in</strong><strong>for</strong>mation, etc.,between corpora. As a result, the usefulness of resources, even when they exist,becomes far from evident <strong>for</strong> users <strong>in</strong> general, as too much time must be spent(especially by users that are not tra<strong>in</strong>ed <strong>in</strong> query <strong>for</strong>malisms as is often the case<strong>in</strong> the context of translation) <strong>in</strong> order to familiarise themselves with the several<strong>in</strong>terfaces and query languages.The two aspects we have po<strong>in</strong>ted out as the most desirable aims, that is, theavailability of large and representative corpora and a more user-friendly access tothe several corpora needed, are be<strong>in</strong>g faced nowadays by some corpus developersby means of common plat<strong>for</strong>ms that allow access to several corpora eventuallybuilt from the Web.2. Internet corpora: An alternative to large corporaIn recent years the arduous and expensive task of build<strong>in</strong>g large corpora has foundas a source of l<strong>in</strong>guistic data (Kilgarriff and Grefenstette 2003) real new chances <strong>in</strong>the World Wide Web. Exploit<strong>in</strong>g the Web as a corpus is becom<strong>in</strong>g a real alternativeto the traditional build<strong>in</strong>g of large corpora, as can be stated by the Internet corporacompiled at the Centre <strong>for</strong> <strong>Translation</strong> Studies of Leeds (Sharoff 2006), the OPUScollection of parallel corpora, or the CUCWeb project developed by the GLiCom

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!