A generic framework for Arabic to English machine ... - Acsu Buffalo
A generic framework for Arabic to English machine ... - Acsu Buffalo
A generic framework for Arabic to English machine ... - Acsu Buffalo
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.3.5 Statistical <strong>machine</strong> translation<br />
4.4. LINGUISTIC ASPECTS OF MT<br />
The ideas behind statistical <strong>machine</strong> translation come out of in<strong>for</strong>mation theory. Essen-<br />
tially, the document is translated on the probability p(e|a) that a string e in the target<br />
language (<strong>for</strong> example <strong>English</strong>) is the translation of a string a in the source language (<strong>for</strong><br />
example <strong>Arabic</strong>). As translation systems are not able <strong>to</strong> s<strong>to</strong>re all native strings and their<br />
translations, a document is typically translated sentence by sentence, but even this is not<br />
enough. We assign <strong>to</strong> every pair of strings (e|a) a number P(e|a), which we interpret as<br />
the probability that a transla<strong>to</strong>r, when presented with e, will produce a as its translation.<br />
You could imagine another program that takes a sentence a as input, and outputs every<br />
conceivable string e along with its P(e|a). This program would take a long time <strong>to</strong> run,<br />
even if you limit <strong>English</strong> translations <strong>to</strong> some arbitrary length. They seek the <strong>English</strong> sen-<br />
tence e that maximizes P(e|a) and minimizes time (Brown et al. 1993). To summarize,<br />
we compute P(e|a) by summing the probabilities of all alignments. For each alignment,<br />
we make two significant simplifying assumptions: Each <strong>English</strong> word is generated by<br />
exactly one <strong>Arabic</strong> word; and the generation of each <strong>English</strong> word is independent of the<br />
generation of all other <strong>English</strong> words in the sentence. This is clearly not true in theory.<br />
4.4 Linguistic aspects of MT<br />
In this section we will look more closely at the kinds of linguistic problems that MT has <strong>to</strong><br />
face and will discuss ways in which MT programs work around these problems. We will<br />
distinguish monolingual problems of morphology, lexical ambiguity, syntactic ambiguity,<br />
pragmatic aspects from bilingual problems of language contrast: lexical mismatches,<br />
structural divergence, typological differences.<br />
56