22.08.2013 Views

A generic framework for Arabic to English machine ... - Acsu Buffalo

A generic framework for Arabic to English machine ... - Acsu Buffalo

A generic framework for Arabic to English machine ... - Acsu Buffalo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.2.3 Lexical databases<br />

4.2. COMPUTATIONAL TECHNIQUES IN MT<br />

A key component of any rule-based MT system is its lexical resources; the in<strong>for</strong>mation<br />

associated with individual words. The field of computational lexicography is concerned<br />

with creating and maintaining computerised dictionaries. In practice, rule-based MT<br />

systems can often have different dictionaries, some containing the core entries, and others<br />

containing specialised vocabulary. An MT lexicon is different from a standard dictionary,<br />

and so is typically concentrated on some linguistically homogeneous set of words, e.g.<br />

abstract nouns, intransitive verbs, or the terminology of a specialist field. It is a good<br />

investment <strong>to</strong> develop <strong>to</strong>ols which aid lexicographers <strong>to</strong> expand the lexicon.<br />

4.2.4 Tokens and <strong>to</strong>kenization<br />

The term “<strong>to</strong>ken” refers <strong>to</strong> an abstraction <strong>for</strong> the smallest unit in a text that is considered<br />

when describing the syntax of a language. A process of <strong>to</strong>kenization can be used <strong>to</strong> split<br />

the sentence in<strong>to</strong> word <strong>to</strong>kens. Although the following example is given as XML there<br />

are many ways <strong>to</strong> represent <strong>to</strong>kenized input. The sentence He went <strong>to</strong> the school. could<br />

be <strong>to</strong>kenised as follows:<br />

<br />

He<br />

went<br />

<strong>to</strong><br />

the<br />

school<br />

<br />

49

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!