Localization
z99kl79
z99kl79
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Review<br />
TermSuite: Open source<br />
TermSuite is an open source and<br />
platform-independent TET written<br />
in Java and distributed under the<br />
Apache License 2.0. It was developed<br />
within the scope of the TTC (Terminology<br />
Extraction, Translation Tools<br />
and Comparable Corpora) project,<br />
whose purpose was to design a tool<br />
capable of extracting bilingual terminology<br />
from comparable corpora<br />
in six languages: English, French,<br />
German, Spanish, Chinese and Russian.<br />
TTC TermSuite's architecture<br />
is composed of three modules: the<br />
Spotter, the Indexer and the Aligner.<br />
The Spotter module is responsible<br />
for preprocessing the input monolingual<br />
corpus, meaning it performs<br />
tokenization, part-of-speech tagging,<br />
stemming and lemmatization. Then,<br />
the Indexer module uses both a statistic<br />
and a linguistic-based approach<br />
to extract monolingual terminology<br />
from a monolingual corpus processed<br />
by the Spotter. Finally, the<br />
Aligner computes the translation of a<br />
source terminology into a target language.<br />
The source and target terms<br />
required are those already computed<br />
by the Indexer module, which means<br />
that the previous two steps should be<br />
repeated for the target language. The<br />
user can choose from several alignment<br />
options, such as the selection of<br />
the maximum number of translation<br />
candidates for a given source term,<br />
the use of similarity measures to<br />
compare the contexts of the term in<br />
the source and the target languages,<br />
amongst other advanced settings.<br />
Once all the parameters are set, it<br />
is possible to view and explore all<br />
the translation candidates ranked<br />
according to their similarity score<br />
within the tool or use the output<br />
XML file for other purposes.<br />
SDL MultiTerm<br />
SimpleExtractor<br />
Web-based terminology<br />
extraction tools<br />
Although standalone TETs still<br />
are predominant in today’s market,<br />
TermSuite<br />
Sketch Engine<br />
Bilingual extraction X X X<br />
Source and target context<br />
comparison<br />
X<br />
Translated<br />
Terminus<br />
Kea<br />
Rainbow<br />
Terms validation X X X X X X X<br />
Bilingual dictionaries<br />
compilation X X<br />
Context extraction X X X X X X X X<br />
JATE<br />
Support various file<br />
formats X X X X X X X X<br />
Rank terms by frequency X X X X X X<br />
Support for many<br />
languages X X X X X X X<br />
Specify the minimal<br />
number of occurrences X X X X X X X<br />
Show linguistic<br />
information X X X<br />
Specify the maximum<br />
number of translations<br />
Stopword list option X X X X X X<br />
Choose the minimum<br />
and maximum number of<br />
words per term<br />
X<br />
X X X X X<br />
Term statistics X X X X X X X X X<br />
Figure 1: Comparison of extraction tools.<br />
future web-based technologies will<br />
certainly evolve by migrating all<br />
standalone features to a web-based<br />
environment, which will allow these<br />
tools to take over market leadership<br />
16 April/May 2016