12.07.2015 Views

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 5. The real use of corpora <strong>in</strong> teach<strong>in</strong>g and research contexts 73With respect to the size, besides the British National Corpus 1 (100 millionwords), there are other large corpora <strong>for</strong> major European languages like the IDS(Institut für Deutsche Sprache) corpus 2 <strong>for</strong> German with 1 billion words, theCREA 3 (Corpus de Referencia del Español Actual) <strong>for</strong> Spanish with over 200million words and the CORIS/CODIS 4 (Dynamic Corpus or Written Italian) <strong>for</strong>Italian with 100 million words. However, there is still a lack of large corpora <strong>for</strong>other major European languages, and particularly <strong>for</strong> less-studied languages likeSerbian, Polish or Basque. As far as the representativeness is concerned, until nowit has been extremely difficult to build large corpora that can satisfy the demandof be<strong>in</strong>g representative of modern language. The problem lies <strong>in</strong> the fact that thistype of resource requires a large build<strong>in</strong>g ef<strong>for</strong>t and has, at the same time, quite ashort “lifetime”, as it becomes outdated <strong>in</strong> a relatively short time. Even the BNCdoes not reflect the language of the last 15 years, so that, <strong>for</strong> <strong>in</strong>stance, a neologismlike malware has no occurrences <strong>in</strong> the corpus. This is the reason why recently thestatic corpus model has been substituted by the so-called monitor corpora, whichare constantly updated to track rapid language changes; the CREA corpus, <strong>for</strong> <strong>in</strong>stance,has been designed as a monitor corpus which is periodically updated sothat it always represents the last twenty-five years of the history of Spanish. Buttak<strong>in</strong>g <strong>in</strong>to account the high price of mak<strong>in</strong>g representative corpora of modernlanguage, on the one side, and the <strong>in</strong>creas<strong>in</strong>g possibilities offered by the Web as asource of l<strong>in</strong>guistic data (Kilgarriff and Grefenstette 2003), on the other, it seemsquite reasonable to state that the future of large corpora lies <strong>in</strong> the Internet as wewill see <strong>in</strong> Section 2.In addition to the availability of large corpora that are representative of modernlanguage, the real needs <strong>in</strong> tra<strong>in</strong><strong>in</strong>g contexts also require quick, user-friendlyaccess to the different corpora types (monol<strong>in</strong>gual source and target corpora,as well as bil<strong>in</strong>gual). This requirement stems from the fact that one of the importantpo<strong>in</strong>ts often made by translation tra<strong>in</strong>ers/tra<strong>in</strong>ees and researchers whenconfronted with the range of electronic resources available <strong>in</strong> general, is that theyrecognise the potential usefulness of the data and the tools, but are unlikely tohave the time to acqua<strong>in</strong>t themselves with the software. This fact seems particularlytrue <strong>for</strong> corpora, if we consider the present state of affairs, referr<strong>in</strong>g to thelack of uni<strong>for</strong>m <strong>in</strong>terfaces <strong>for</strong> access<strong>in</strong>g resources. Interfaces differ not only <strong>in</strong> their1. See The British National Corpus, version 2 (BNC World). 2001. Distributed by Ox<strong>for</strong>d UniversityComput<strong>in</strong>g Services on behalf of the BNC Consortium. http://www.natcorp.ox.ac.uk/2. http://www.ids-mannheim.de/cosmas2/3. REAL ACADEMIA ESPAÑOLA: Database (CORDE) [onl<strong>in</strong>e]. Corpus diacrónico del español.http://www.rae.es4. http://corpora.dslo.unibo.it/coris_ita.html

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!