13.05.2014 Views

Research Report on Bangla Verb and oun Morphological Analysis

Research Report on Bangla Verb and oun Morphological Analysis

Research Report on Bangla Verb and oun Morphological Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Working Papers 2004-2007<br />

Locative<br />

Animate<br />

Inanimate<br />

a<br />

Table 3: Plural n<strong>oun</strong> inflecti<strong>on</strong>s<br />

Animate<br />

Inanimate<br />

Nominative t g<br />

Objective t(к) g<br />

Genitive t g<br />

Locative<br />

3. Comp<strong>on</strong>ents<br />

3.1. Transliterati<strong>on</strong> file<br />

g<br />

The original PC-KIMMO software is written in C<br />

programming language <strong>and</strong> uses <strong>on</strong>ly Latin<br />

alphanumeric characters for input <strong>and</strong> output<br />

purposes. For inputs using scripts other than Latin, the<br />

user has to come up with his/her own transliterati<strong>on</strong><br />

scheme that uses Latin characters corresp<strong>on</strong>ding to<br />

characters of the n<strong>on</strong>-Latin script. Viewing <strong>and</strong><br />

underst<strong>and</strong>ing the input <strong>and</strong> output strings in such a<br />

way can be cumbersome <strong>and</strong> n<strong>on</strong>-intuitive for the<br />

user.<br />

JKimmo solves this problem in a modular,<br />

abstract fashi<strong>on</strong>. It requires that the whole<br />

transliterati<strong>on</strong> scheme be written down in a separate<br />

file. The user can then load that transliterati<strong>on</strong> file.<br />

Once the transliterati<strong>on</strong> file is loaded, the user can<br />

input strings <strong>and</strong> view output strings in his preferred<br />

language in an intuitive way. Transliterati<strong>on</strong> scheme<br />

for Bengali language is given in Table 4.<br />

3.2. Rule file<br />

Two level orthographic rules are required for<br />

JKimmo <strong>and</strong> PC-KIMMO. The general structure of<br />

the rules file is a list of declarati<strong>on</strong>s composed of a<br />

keyword followed by data. The set of valid keywords<br />

in a rules file includes COMMENT, ALPHABET,<br />

NULL, ANY, BOUNDARY, SUBSET, RULE, <strong>and</strong><br />

END. The COMMENT, SUBSET <strong>and</strong> RULE<br />

declarati<strong>on</strong>s are opti<strong>on</strong>al <strong>and</strong> also can be used more<br />

than <strong>on</strong>ce in a rules file. The END declarati<strong>on</strong> is also<br />

opti<strong>on</strong>al, but can <strong>on</strong>ly be used <strong>on</strong>ce. PC-KIMMO <strong>on</strong>ly<br />

recognizes Latin characters in rule file. To implement<br />

rule for language that uses other than Latin script we<br />

must follow the transliterati<strong>on</strong> scheme. There is a free<br />

rule compiler for PC-KIMMO called kgen is<br />

available. It takes rule specificati<strong>on</strong> <strong>and</strong> it generate<br />

rule for PC-KIMMO. There are more free tools<br />

available that can be used for rule generati<strong>on</strong>. A<br />

sample rules file has shown in Table 5.<br />

Table 4: Bengali transliterati<strong>on</strong> scheme<br />

<strong>Bangla</strong> Latin <strong>Bangla</strong> Latin <strong>Bangla</strong> Latin <strong>Bangla</strong> Latin <strong>Bangla</strong> Latin<br />

◌ ^ ◌ a G N r<br />

a A ◌ I G t l<br />

å F ◌ I ? T ш S<br />

i H ◌ u C d $<br />

и L ◌ U C D s<br />

u M ◌ R J n h<br />

Q ◌ e J p '<br />

V ◌ E Q P "<br />

e W ◌ o a V b Y<br />

X ◌ O W B ◌ %<br />

o Z к k X m ◌ &<br />

F х K Z y ◌ ~<br />

;MAIN RULE FILE BANGLAMORPHOLOGY.RUL<br />

Table 5: Sample rule file<br />

ALPHABET<br />

k K g G ? c C J q v w x z N t T d D n p P b B m y r l S $ s h ' " Y & ~ ^ a i I u U R e E o O A F H L M Q V W X Z f + j<br />

NULL 0<br />

ANY @<br />

45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!