12.07.2015 Views

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

50 Silvia Bernard<strong>in</strong>i and Sara Castagnoli5. Build<strong>in</strong>g corporaIn order <strong>for</strong> corpora to stably enter the translators’ workflow, however, one cannotsimply rely on well thought-out learn<strong>in</strong>g materials. It was previously suggested(Section 3) that corpus construction and use should also be made easier and faster,so that these tools can compete with others that translators use <strong>in</strong> their everydayactivity, such as TMs and the Web. 94.6% of the MeLLANGE questionnaire respondentsreported consult<strong>in</strong>g the Web through Google despite several drawbacksthat most of them are aware of, such as the unhelpfulness of the sort order (20.7%),the lack of l<strong>in</strong>guistic <strong>in</strong><strong>for</strong>mation (14.7%), the unreliability of frequency statistics(12.6%), or the <strong>in</strong>adequacy of context display (9.9%). This suggests that corporacould <strong>in</strong>deed play a role among translation tools, if rema<strong>in</strong><strong>in</strong>g obstacles (especiallythe time needed <strong>for</strong> construction and the required search skills) were removed.5.1 Thepresent...One of the major achievements <strong>in</strong> corpus-based language learn<strong>in</strong>g <strong>in</strong> the pastdecade has been the creation of tools that allow users to consult the Web <strong>in</strong> a morel<strong>in</strong>guistically-<strong>in</strong><strong>for</strong>med way, and/or that facilitate the construction of corpora fromthe Web. While search eng<strong>in</strong>es such as Google provide fast and effective retrieval of<strong>in</strong><strong>for</strong>mation from the Web, they are less than ideal when it gets to basic l<strong>in</strong>guisticprocedures such as highlight<strong>in</strong>g patterns (i.e., sort<strong>in</strong>g results) or select<strong>in</strong>g subsetsof solutions, not to mention conduct<strong>in</strong>g searches <strong>for</strong> l<strong>in</strong>guistically-annotatedsequences (e.g., all verb lemmas preced<strong>in</strong>g a certa<strong>in</strong> noun lemma) (Thelwall 2005).A solution to some of these problems has been provided by tools likeKWiCF<strong>in</strong>der (Fletcher 2004), an onl<strong>in</strong>e concordancer that supports regular expressions,implements concordance-like displays and functionalities (e.g., sort<strong>in</strong>g),and allows off-l<strong>in</strong>e perusal of the retrieved texts. Along similar l<strong>in</strong>es, anotherfreely available tool, the TextSTAT concordancer, 5 allows one to specify a URL andretrieve a file or set of files from a s<strong>in</strong>gle website directly from with<strong>in</strong> the concordancer,thus conflat<strong>in</strong>g and speed<strong>in</strong>g up the processes of retriev<strong>in</strong>g texts andsearch<strong>in</strong>g through them. Corporator (Fairon 2006) only addresses the first issue(retriev<strong>in</strong>g texts from the Web): it automates the process of corpus collection anddevelopment allow<strong>in</strong>g bulk retrieval of selected websites that offer RSS feeds. Thecorpus thus created can then be updated regularly, and is searchable with a regularconcordancer.While KWiCF<strong>in</strong>der is designed ma<strong>in</strong>ly with language learn<strong>in</strong>g applications <strong>in</strong>m<strong>in</strong>d (search<strong>in</strong>g <strong>for</strong> a given word or expression as one would search the Web),5. http://www.niederlandistik.fu-berl<strong>in</strong>.de/textstat/software-en.html

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!