06.04.2016 Views

Localization

z99kl79

z99kl79

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Review<br />

TermSuite: Open source<br />

TermSuite is an open source and<br />

platform-independent TET written<br />

in Java and distributed under the<br />

Apache License 2.0. It was developed<br />

within the scope of the TTC (Terminology<br />

Extraction, Translation Tools<br />

and Comparable Corpora) project,<br />

whose purpose was to design a tool<br />

capable of extracting bilingual terminology<br />

from comparable corpora<br />

in six languages: English, French,<br />

German, Spanish, Chinese and Russian.<br />

TTC TermSuite's architecture<br />

is composed of three modules: the<br />

Spotter, the Indexer and the Aligner.<br />

The Spotter module is responsible<br />

for preprocessing the input monolingual<br />

corpus, meaning it performs<br />

tokenization, part-of-speech tagging,<br />

stemming and lemmatization. Then,<br />

the Indexer module uses both a statistic<br />

and a linguistic-based approach<br />

to extract monolingual terminology<br />

from a monolingual corpus processed<br />

by the Spotter. Finally, the<br />

Aligner computes the translation of a<br />

source terminology into a target language.<br />

The source and target terms<br />

required are those already computed<br />

by the Indexer module, which means<br />

that the previous two steps should be<br />

repeated for the target language. The<br />

user can choose from several alignment<br />

options, such as the selection of<br />

the maximum number of translation<br />

candidates for a given source term,<br />

the use of similarity measures to<br />

compare the contexts of the term in<br />

the source and the target languages,<br />

amongst other advanced settings.<br />

Once all the parameters are set, it<br />

is possible to view and explore all<br />

the translation candidates ranked<br />

according to their similarity score<br />

within the tool or use the output<br />

XML file for other purposes.<br />

SDL MultiTerm<br />

SimpleExtractor<br />

Web-based terminology<br />

extraction tools<br />

Although standalone TETs still<br />

are predominant in today’s market,<br />

TermSuite<br />

Sketch Engine<br />

Bilingual extraction X X X<br />

Source and target context<br />

comparison<br />

X<br />

Translated<br />

Terminus<br />

Kea<br />

Rainbow<br />

Terms validation X X X X X X X<br />

Bilingual dictionaries<br />

compilation X X<br />

Context extraction X X X X X X X X<br />

JATE<br />

Support various file<br />

formats X X X X X X X X<br />

Rank terms by frequency X X X X X X<br />

Support for many<br />

languages X X X X X X X<br />

Specify the minimal<br />

number of occurrences X X X X X X X<br />

Show linguistic<br />

information X X X<br />

Specify the maximum<br />

number of translations<br />

Stopword list option X X X X X X<br />

Choose the minimum<br />

and maximum number of<br />

words per term<br />

X<br />

X X X X X<br />

Term statistics X X X X X X X X X<br />

Figure 1: Comparison of extraction tools.<br />

future web-based technologies will<br />

certainly evolve by migrating all<br />

standalone features to a web-based<br />

environment, which will allow these<br />

tools to take over market leadership<br />

16 April/May 2016

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!