12.07.2015 Views

UTF-8__0415396417Translation

UTF-8__0415396417Translation

UTF-8__0415396417Translation

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

TECHNOLOGY AND TRANSLATIONdomain-specific expressions, often multi-word expressions (MWEs). In translationapplications, terminology can be massively multilingual.In company documentation and websites, terms communicate both contentand brand. The impact of defective terminology on customer satisfaction canbe incalculable, from impairing the usability of a product through inconsistentusage of terms in the accompanying manual to compromising healthand safety. For instance, using ‘shadow cursor’, ‘grid cursor’ and ‘scalecursor’ to refer to one and the same object is confusing and wasteful.The translation process can simply propagate these defects to the localizedversions or compound them through mis-translation. The fact thatglobalization has turned authoring and translation into team activities onlyheightens the risk of inconsistencies. The concomitant increasing humanreliance on authoring, CAT and MT tools means, therefore, that terminologyneeds to be unified across all these applications. The benefits include apossibly significant cut in time spent on research and revision, and a gain inaccuracy.The clear implication is that terminology – process and product – needs tobe managed centrally and delivered locally. This is the rationale behind TBX,already described, and the emergence of powerful tools for identifying andmanaging terms.7.2.1 TERM EXTRACTIONExtracting, or ‘mining’ terminology from monolingual or parallel corpora maybe done by a language service provider (LSP) in preparation for a job or, in thecase of an MT vendor, prospectively to extend the system’s domain coverage.The technology exploits two main approaches for finding candidate terms:linguistic approaches require part-of-speech tagged data to identify word combinationsthat match predetermined patterns (e.g. NOUN+NOUN – waterpressure), while statistical approaches rely on the fact that the componentparts of terminological MWEs tend to co-occur more often than would bepredicted by chance (e.g. dialogue and box). A particular tool may combineelements of both approaches.Searching for patterns such as NOUN+of+NOUN (e.g. part of speech,best of breed) or ADJECTIVE+NOUN (hard drive) will successfully findmatching terms however infrequent, but tends to return also many falseor irrelevant candidates that need to be eliminated manually (e.g. cup oftea, long walk). So the initial list of candidates may be filtered accordingto various statistical criteria and the survivors ranked according to theirlikely ‘termhood’. A further disadvantage of the linguistic approach is thatthe patterns need to be redefined for every language processed. Purelystatistical methods escape this drawback and are language-independent,but overlook terms whose frequency of occurrence is below some presetthreshold.113

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!