Topics in Language Resources for Translation ... - ymerleksi - home

More documents

Recommendations

Info

Chapter 5. The real use of corpora in teaching and research contexts 73With respect to the size, besides the British National Corpus 1 (100 millionwords), there are other large corpora for major European languages like the IDS(Institut für Deutsche Sprache) corpus 2 for German with 1 billion words, theCREA 3 (Corpus de Referencia del Español Actual) for Spanish with over 200million words and the CORIS/CODIS 4 (Dynamic Corpus or Written Italian) forItalian with 100 million words. However, there is still a lack of large corpora forother major European languages, and particularly for less-studied languages likeSerbian, Polish or Basque. As far as the representativeness is concerned, until nowit has been extremely difficult to build large corpora that can satisfy the demandof being representative of modern language. The problem lies in the fact that thistype of resource requires a large building effort and has, at the same time, quite ashort “lifetime”, as it becomes outdated in a relatively short time. Even the BNCdoes not reflect the language of the last 15 years, so that, for instance, a neologismlike malware has no occurrences in the corpus. This is the reason why recently thestatic corpus model has been substituted by the so-called monitor corpora, whichare constantly updated to track rapid language changes; the CREA corpus, for instance,has been designed as a monitor corpus which is periodically updated sothat it always represents the last twenty-five years of the history of Spanish. Buttaking into account the high price of making representative corpora of modernlanguage, on the one side, and the increasing possibilities offered by the Web as asource of linguistic data (Kilgarriff and Grefenstette 2003), on the other, it seemsquite reasonable to state that the future of large corpora lies in the Internet as wewill see in Section 2.In addition to the availability of large corpora that are representative of modernlanguage, the real needs in training contexts also require quick, user-friendlyaccess to the different corpora types (monolingual source and target corpora,as well as bilingual). This requirement stems from the fact that one of the importantpoints often made by translation trainers/trainees and researchers whenconfronted with the range of electronic resources available in general, is that theyrecognise the potential usefulness of the data and the tools, but are unlikely tohave the time to acquaint themselves with the software. This fact seems particularlytrue for corpora, if we consider the present state of affairs, referring to thelack of uniform interfaces for accessing resources. Interfaces differ not only in their1. See The British National Corpus, version 2 (BNC World). 2001. Distributed by Oxford UniversityComputing Services on behalf of the BNC Consortium. http://www.natcorp.ox.ac.uk/2. http://www.ids-mannheim.de/cosmas2/3. REAL ACADEMIA ESPAÑOLA: Database (CORDE) [online]. Corpus diacrónico del español.http://www.rae.es4. http://corpora.dslo.unibo.it/coris_ita.html
74 Carme Colominas and Toni Badialayout, but in the types of queries they allow for, and this even affects the exploitationpossibilities and especially those that imply comparing the results obtainedfrom several corpora. For instance, it is quite difficult to compare the usage of e.g.,ES/CA molar as verb (for ‘to be great’, ‘amazing’, ‘cool’, etc.) in the jargon of theyoung Catalan and Spanish, as the available corpus in one language (CUCWeb forCatalan searches) allows for searches by lemma, whereas the one available in theother (CREA) does not. A similar problem arises when we try to compare patternsof use of a verb like like in the BNC and mögen in the German IDS corpus. Despitebeing one of the best available reference corpora, the BNC is not lemmatised,which considerably restricts its potential use and the possibilities of performingthis kind of comparison with other languages for which a lemmatised corpus isavailable. In other words, the range of functionality for automated retrieval ofcorpora is greatly dependent on annotation, and differences between corpora inthis matter limit their potential usage considerably. Besides annotation, corporadiffer from each other depending on the query language used. Compare, for example,the different query syntaxes by using Xaira (to access the BNC) or CorpusWorkbench. Taking into account that translation students and researchers workcommonly with at least three or four different languages, they need to access constantlyseveral URLs in order to get familiar with different interfaces and querylanguages and, what is worse, to face the differences in creating concordances (byform, lemma or part-of-speech (POS)), in gathering statistical information, etc.,between corpora. As a result, the usefulness of resources, even when they exist,becomes far from evident for users in general, as too much time must be spent(especially by users that are not trained in query formalisms as is often the casein the context of translation) in order to familiarise themselves with the severalinterfaces and query languages.The two aspects we have pointed out as the most desirable aims, that is, theavailability of large and representative corpora and a more user-friendly access tothe several corpora needed, are being faced nowadays by some corpus developersby means of common platforms that allow access to several corpora eventuallybuilt from the Web.2. Internet corpora: An alternative to large corporaIn recent years the arduous and expensive task of building large corpora has foundas a source of linguistic data (Kilgarriff and Grefenstette 2003) real new chances inthe World Wide Web. Exploiting the Web as a corpus is becoming a real alternativeto the traditional building of large corpora, as can be stated by the Internet corporacompiled at the Centre for Translation Studies of Leeds (Sharoff 2006), the OPUScollection of parallel corpora, or the CUCWeb project developed by the GLiCom
Page 3 and 4:
Benjamins Translation Library (BTL)
Page 5 and 6:
8 TMThe paper used in this publicat
Page 7 and 8:
VITopics in Language Resources for
Page 9 and 10:
VIII Topics in Language Resources f
Page 11 and 12:
XTopics in Language Resources for T
Page 13 and 14:
XIITopics in Language Resources for
Page 15 and 16:
2 Lynne Bowker and Michael Barlowde
Page 17 and 18:
4 Lynne Bowker and Michael BarlowFi
Page 19 and 20:
6 Lynne Bowker and Michael Barlow2.
Page 21 and 22:
8 Lynne Bowker and Michael BarlowOn
Page 23 and 24:
10 Lynne Bowker and Michael Barlow4
Page 25 and 26:
12 Lynne Bowker and Michael Barlowb
Page 27 and 28:
14 Lynne Bowker and Michael Barlows
Page 29 and 30:
16 Lynne Bowker and Michael Barlowp
Page 31 and 32:
18 Lynne Bowker and Michael Barlowt
Page 33 and 34:
20 Lynne Bowker and Michael Barlowg
Page 35 and 36: 22 Lynne Bowker and Michael BarlowM
Page 37 and 38: 24 Silvia Hansen-Schirraphenomenon;
Page 39 and 40: 26 Silvia Hansen-Schirratreebank pr
Page 41 and 42: 28 Silvia Hansen-Schirracurrently a
Page 43 and 44: 30 Silvia Hansen-Schirrawhichaltern
Page 45 and 46: 32 Silvia Hansen-Schirra(1) We cont
Page 47 and 48: 34 Silvia Hansen-Schirrarealisation
Page 49 and 50: 36 Silvia Hansen-Schirratranslation
Page 52 and 53: chapter 3Corpora for translator edu
Page 54 and 55: Chapter 3. Corpora for translator e
Page 68: Chapter 3. Corpora for translator e
Page 71 and 72: 58 Belinda Maiahindsight, one can n
Page 73 and 74: 60 Belinda Maiabeen translated by m
Page 75 and 76: 62 Belinda Maiastudy reformulations
Page 77 and 78: 64 Belinda MaiaSearchablecorporaenc
Page 79 and 80: 66 Belinda Maia- Find definition ca
Page 81 and 82: 68 Belinda MaialishaEuropeanMaster
Page 83 and 84: 70 Belinda MaiaMaia, B. and L. Sarm
Page 85: 72 Carme Colominas and Toni Badiadi
Page 89 and 90: 76 Carme Colominas and Toni Badiath
Page 91 and 92: 78 Carme Colominas and Toni Badiath
Page 93 and 94: 80 Carme Colominas and Toni BadiaTa
Page 95 and 96: 82 Carme Colominas and Toni BadiaAs
Page 97 and 98: 84 Carme Colominas and Toni BadiaFi
Page 99 and 100: 86 Carme Colominas and Toni Badiaco
Page 101 and 102: 88 Carme Colominas and Toni BadiaVa
Page 103 and 104: 90 Rachélle GautonIzwaini (2003:17
Page 105 and 106: 92 Rachélle Gautonneeded by the Ba
Page 107 and 108: 94 Rachélle GautonThese electronic
Page 109 and 110: 96 Rachélle Gautonthat of Bantu la
Page 111 and 112: 98 Rachélle GautonSeeagainFig.1for
Page 113 and 114: 100 Rachélle GautonLocke, translat
Page 115 and 116: 102 Rachélle GautonHaving to work
Page 117 and 118: 104 Rachélle Gautonmore, after suc
Page 119 and 120: 106 Rachélle GautonMcEnery, A. and
Page 121 and 122: 108 Marie-Josée de Saint Roberta c
Page 123 and 124: 110 Marie-Josée de Saint Robertpre
Page 125 and 126: 112 Marie-Josée de Saint Robertlef
Page 127 and 128: 114 Marie-Josée de Saint Robertcap
Page 129 and 130: 116 Marie-Josée de Saint Robertinf
Page 131 and 132: 118 Marie-Josée de Saint Roberttra
Page 134 and 135: chapter 8Global content managementC
Page 136 and 137:
Chapter 8. Global content managemen
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
chapter 9BEYTransA Wiki-based envir
Page 150 and 151:
Chapter 9. BEYTrans 137Trans system
Page 152 and 153:
Chapter 9. BEYTrans 1391. Facilitat
Page 154 and 155:
Chapter 9. BEYTrans 141lators to ch
Page 156 and 157:
Chapter 9. BEYTrans 1435.2 Translat
Page 158 and 159:
Chapter 9. BEYTrans 1456. BEYTrans:
Page 160 and 161:
Chapter 9. BEYTrans 1476.2.2 Multil
Page 162 and 163:
Chapter 9. BEYTrans 149Bey, Y., C.
Page 164 and 165:
chapter 10Standardising the managem
Page 166 and 167:
Chapter 10. Standardising multiling
Page 168 and 169:
Page 170 and 171:
Page 172 and 173:
Page 174 and 175:
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
Page 186 and 187:
chapter 11Tagging and tracing Progr
Page 188 and 189:
Chapter 11. Tagging and tracing Pro
Page 190 and 191:
Page 192 and 193:
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
chapter 12Linguistic resources and
Page 210 and 211:
Chapter 12. Linguistic resources an
Page 212 and 213:
Page 214 and 215:
Page 216 and 217:
Page 218 and 219:
Page 220 and 221:
Page 222 and 223:
Page 224 and 225:
Page 226 and 227:
Page 228 and 229:
IndexAAfrican language translatorX,
Page 230 and 231:
Index 217Expert Advisory Group onLa
Page 232 and 233:
Index 219open standards 206, 208,21
Page 234 and 235:
Benjamins Translation LibraryA comp
Page 236:
27 Beylard-Ozeroff, Ann, Jana Král
show all

Topics in Language Resources for Translation ... - ymerleksi - home

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?