12.07.2015 Views

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

Topics in Language Resources for Translation ... - ymerleksi - home

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4. CORPÓGRAFO V.4 65m<strong>in</strong>ology, or re-comb<strong>in</strong>e a selection of academic articles from several subjectdoma<strong>in</strong>s, <strong>in</strong> order to study the stylistic and syntactic aspects of the genre);– Align parallel corpora;– Store multimedia files, e.g., sound files <strong>for</strong> pronunciation, or images to relateto lexical or term<strong>in</strong>ological items;– Register the names of different users <strong>for</strong> group work.The Pesquisa (Corpus analysis) area allows the use of the personally selected corporato:– Search <strong>for</strong> concordances us<strong>in</strong>g regular expressions;– Search <strong>for</strong> concordances us<strong>in</strong>g the NooJ resources (<strong>in</strong> English, French andPortuguese – see http://www.nooj4nlp.net/).The concordances can be viewed as whole sentences or as KWIC concordances ofup to 15 words each side, <strong>in</strong>clud<strong>in</strong>g the usual left or right sort<strong>in</strong>g functions.5.2 Creat<strong>in</strong>g term<strong>in</strong>ological and lexical / phrasal databasesThe next section is called the Centro de Conhecimento (Knowledge centre) andnow conta<strong>in</strong>s two types of database, BD Term<strong>in</strong>ológicas (term<strong>in</strong>ology databases)and BD Fraseológicas (phrasal databases) <strong>for</strong> the management of words andphrases. The databases have much <strong>in</strong> common, but certa<strong>in</strong> aspects are specific tothe analysis required.In both cases, the first th<strong>in</strong>g one must do is create a database and supply thenecessary metadata. Several different databases can be created if one so wishes, buteach database is designed to be multil<strong>in</strong>gual so that l<strong>in</strong>ks can be made between thedata on presumed equivalents between languages.The next step is to extract <strong>in</strong><strong>for</strong>mation from the corpora and enter it <strong>in</strong> thedatabase. The <strong>in</strong><strong>for</strong>mation extracted from corpora automatically br<strong>in</strong>gs with it themetadata (authors and sources) of the texts <strong>in</strong> which it was found. The databasesall allow <strong>for</strong> the <strong>in</strong>sertion of <strong>in</strong><strong>for</strong>mation on morphology, def<strong>in</strong>itions, contexts(examples taken from concordances), lexical or semantic relations, related termsor expressions, translation equivalents, and l<strong>in</strong>ks to any relevant multimedia files.The differences between the two types of database are <strong>in</strong> the methods of extract<strong>in</strong>gterms and lexical expressions from the corpora and <strong>in</strong> ways of classify<strong>in</strong>gthe results.The term<strong>in</strong>ology databases allow one to:– Extract term<strong>in</strong>ological units us<strong>in</strong>g an n-gram tool with automatic lexicalsearch restrictions <strong>for</strong> Portuguese (PT), English (EN), French (FR), Italian(IT) and Spanish (ES);

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!