22.08.2013 Views

A generic framework for Arabic to English machine ... - Acsu Buffalo

A generic framework for Arabic to English machine ... - Acsu Buffalo

A generic framework for Arabic to English machine ... - Acsu Buffalo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.3.5 Statistical <strong>machine</strong> translation<br />

4.4. LINGUISTIC ASPECTS OF MT<br />

The ideas behind statistical <strong>machine</strong> translation come out of in<strong>for</strong>mation theory. Essen-<br />

tially, the document is translated on the probability p(e|a) that a string e in the target<br />

language (<strong>for</strong> example <strong>English</strong>) is the translation of a string a in the source language (<strong>for</strong><br />

example <strong>Arabic</strong>). As translation systems are not able <strong>to</strong> s<strong>to</strong>re all native strings and their<br />

translations, a document is typically translated sentence by sentence, but even this is not<br />

enough. We assign <strong>to</strong> every pair of strings (e|a) a number P(e|a), which we interpret as<br />

the probability that a transla<strong>to</strong>r, when presented with e, will produce a as its translation.<br />

You could imagine another program that takes a sentence a as input, and outputs every<br />

conceivable string e along with its P(e|a). This program would take a long time <strong>to</strong> run,<br />

even if you limit <strong>English</strong> translations <strong>to</strong> some arbitrary length. They seek the <strong>English</strong> sen-<br />

tence e that maximizes P(e|a) and minimizes time (Brown et al. 1993). To summarize,<br />

we compute P(e|a) by summing the probabilities of all alignments. For each alignment,<br />

we make two significant simplifying assumptions: Each <strong>English</strong> word is generated by<br />

exactly one <strong>Arabic</strong> word; and the generation of each <strong>English</strong> word is independent of the<br />

generation of all other <strong>English</strong> words in the sentence. This is clearly not true in theory.<br />

4.4 Linguistic aspects of MT<br />

In this section we will look more closely at the kinds of linguistic problems that MT has <strong>to</strong><br />

face and will discuss ways in which MT programs work around these problems. We will<br />

distinguish monolingual problems of morphology, lexical ambiguity, syntactic ambiguity,<br />

pragmatic aspects from bilingual problems of language contrast: lexical mismatches,<br />

structural divergence, typological differences.<br />

56

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!