22.08.2013 Views

A generic framework for Arabic to English machine ... - Acsu Buffalo

A generic framework for Arabic to English machine ... - Acsu Buffalo

A generic framework for Arabic to English machine ... - Acsu Buffalo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.4.1 Non-Roman alphabet scripts<br />

4.4. LINGUISTIC ASPECTS OF MT<br />

Since computer technology developed mostly in <strong>English</strong>, other languages, particularly<br />

those with non-Roman alphabet have his<strong>to</strong>rically been seen as a special case and required<br />

new code sets <strong>to</strong> define character representations. Furthermore, not all languages with<br />

alphabetic scripts are written left-<strong>to</strong>-right, e.g. <strong>Arabic</strong> and Hebrew, so any input/output<br />

devices making this assumption will be useless <strong>for</strong> such languages. Be<strong>for</strong>e Unicode was<br />

standardised, there were different encoding systems <strong>for</strong> assigning this problem. Unicode<br />

provides a unique code <strong>for</strong> every character, no matter what the plat<strong>for</strong>m, the program and<br />

the language are. Appendix A provides the corresponding Unicode <strong>for</strong> each <strong>Arabic</strong> letter<br />

and describes the letters with their corresponding written shapes.<br />

4.4.2 Lexical ambiguity<br />

Category ambiguities or homographs are examples of lexical ambiguities which arise<br />

when there are potentially two or more ways in which a word can be analysed. More<br />

complex are lexical ambiguities, where one word can be interpreted in more than one<br />

way. Lexical ambiguities are of three basic types: category ambiguities, homographs and<br />

transfer (or translational) ambiguities.<br />

4.4.2.1 Category ambiguity<br />

The simplest type of lexical ambiguity is that of category ambiguity: a given word could<br />

be assigned <strong>to</strong> more than one grammatical or syntactic category (e.g. noun, verb or<br />

adjective) according <strong>to</strong> the context. There are several examples of this in <strong>English</strong>: light<br />

can be a noun, verb or adjective, also, control can be a noun or verb. In <strong>Arabic</strong> there<br />

are some words that can be in more than one category, <strong>for</strong> example ֒lā could be a<br />

preposition with meaning of “on”, or a verb with meaning of “raise”.<br />

57

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!