A generic framework for Arabic to English machine ... - Acsu Buffalo
A generic framework for Arabic to English machine ... - Acsu Buffalo
A generic framework for Arabic to English machine ... - Acsu Buffalo
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.2.3 Lexical databases<br />
4.2. COMPUTATIONAL TECHNIQUES IN MT<br />
A key component of any rule-based MT system is its lexical resources; the in<strong>for</strong>mation<br />
associated with individual words. The field of computational lexicography is concerned<br />
with creating and maintaining computerised dictionaries. In practice, rule-based MT<br />
systems can often have different dictionaries, some containing the core entries, and others<br />
containing specialised vocabulary. An MT lexicon is different from a standard dictionary,<br />
and so is typically concentrated on some linguistically homogeneous set of words, e.g.<br />
abstract nouns, intransitive verbs, or the terminology of a specialist field. It is a good<br />
investment <strong>to</strong> develop <strong>to</strong>ols which aid lexicographers <strong>to</strong> expand the lexicon.<br />
4.2.4 Tokens and <strong>to</strong>kenization<br />
The term “<strong>to</strong>ken” refers <strong>to</strong> an abstraction <strong>for</strong> the smallest unit in a text that is considered<br />
when describing the syntax of a language. A process of <strong>to</strong>kenization can be used <strong>to</strong> split<br />
the sentence in<strong>to</strong> word <strong>to</strong>kens. Although the following example is given as XML there<br />
are many ways <strong>to</strong> represent <strong>to</strong>kenized input. The sentence He went <strong>to</strong> the school. could<br />
be <strong>to</strong>kenised as follows:<br />
<br />
He<br />
went<br />
<strong>to</strong><br />
the<br />
school<br />
<br />
49