22.08.2013 Views

A generic framework for Arabic to English machine ... - Acsu Buffalo

A generic framework for Arabic to English machine ... - Acsu Buffalo

A generic framework for Arabic to English machine ... - Acsu Buffalo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.2. DESIGNING AN XML LEXICON ARCHITECTURE FOR ARABIC MT BASED ON RRG<br />

In Phase 3 in Figure 5.1, the UniArab system <strong>to</strong>kenizes a sentence in<strong>to</strong> words, then sends<br />

each word <strong>to</strong> the search engine within the Lexicon <strong>to</strong> query the category of each word<br />

and all attributes <strong>for</strong> that word. The Lexicon returns the corresponding category and its<br />

attributes as detailed below. The Morphology Parser, Phase 5, receives the word meta-<br />

data and ensures that the properties of the words are consistent. The verb attributes in<br />

particular, are of great importance in correctly extracting sentence logical structure fur-<br />

ther down the processing chain, helping <strong>to</strong> answer the basic question ‘Who does what?’<br />

In free word order sentences, <strong>for</strong> example, yh. b qys lylā, ‘Qays loves<br />

Laila’ multiple orders are possible including verb-subject-object, verb-object-subject or<br />

subject-verb-object. The attributes of the verb agree with the gender of the subject. Given<br />

the masculine gender of the verb in this case, the Syntactic Parser will look <strong>for</strong> a mascu-<br />

line proper noun <strong>to</strong> make the ac<strong>to</strong>r <strong>for</strong> this sentence. If there is more than one masculine<br />

proper noun in such a case, then Modern Standard <strong>Arabic</strong> defines the first proper noun<br />

as the ac<strong>to</strong>r. The Morphology Parser will be extended so that it can deal with words that<br />

are defined in multiple categories, deciding which should be processed. Meanwhile the<br />

Syntactic Parser, so far, has only been implemented <strong>for</strong> extracting word order, though it<br />

will be extended <strong>to</strong> deal with word ambiguities in future versions.<br />

5.2.3 Lexical properties<br />

Figure 5.2 shows the structure of the Lexicon including the properties s<strong>to</strong>red <strong>for</strong> each<br />

word category. For all categories, an <strong>Arabic</strong> word is s<strong>to</strong>red along with its <strong>English</strong> repre-<br />

sentation. Since word ambiguity has not been dealt with so far, there is a one <strong>to</strong> one map-<br />

ping <strong>for</strong> the simple sentences which UniArab processes up <strong>to</strong> now. However, word am-<br />

biguity is supported in the structure, with each possible case s<strong>to</strong>red as a separate record.<br />

All search results will be passed <strong>to</strong> the Morphology Parser <strong>to</strong> decide which is taken.<br />

71

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!