13.05.2014 Views

Research Report on Bangla Verb and oun Morphological Analysis

Research Report on Bangla Verb and oun Morphological Analysis

Research Report on Bangla Verb and oun Morphological Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<str<strong>on</strong>g>Research</str<strong>on</strong>g> <str<strong>on</strong>g>Report</str<strong>on</strong>g> <strong>on</strong> <strong>Bangla</strong> <strong>Verb</strong> <strong>and</strong> <strong>oun</strong> <strong>Morphological</strong> <strong>Analysis</strong><br />

Md. Zahurul Islam<br />

Center for <str<strong>on</strong>g>Research</str<strong>on</strong>g> <strong>on</strong> <strong>Bangla</strong> Language Processing, BRAC University<br />

zaharul@bracu.ac.bd<br />

Abstract<br />

This report describes the inflecti<strong>on</strong> <strong>Bangla</strong> verb<br />

<strong>and</strong> n<strong>oun</strong> morphology <strong>and</strong> rules, lexic<strong>on</strong>s <strong>and</strong><br />

grammar for <strong>Bangla</strong> morphological analysis.<br />

1. Introducti<strong>on</strong><br />

This report describes <strong>Bangla</strong> verb <strong>and</strong> n<strong>oun</strong><br />

morphology <strong>and</strong> also the two level rules, lexic<strong>on</strong> <strong>and</strong><br />

unificati<strong>on</strong> based grammar for <strong>Bangla</strong> verbs <strong>and</strong><br />

n<strong>oun</strong>s. These rules, lexic<strong>on</strong> <strong>and</strong> grammar are based <strong>on</strong><br />

PC-KIMMO (a two level morphological Analyzer)<br />

<strong>and</strong> JKimmo (A multilingual computati<strong>on</strong><br />

morphology frame work for PC-KIMMO). JKimmo is<br />

a multilingual wrapper ar<strong>oun</strong>d PC-KIMMO that<br />

enables to use <strong>Bangla</strong> language for input <strong>and</strong> output.<br />

2. <strong>Bangla</strong> <strong>Morphological</strong> <strong>Analysis</strong><br />

2.1. <strong>Verb</strong>s<br />

<strong>Verb</strong>s divide into two classes: finite <strong>and</strong> n<strong>on</strong>finite.<br />

N<strong>on</strong>-finite verbs have no inflecti<strong>on</strong> for tense or<br />

pers<strong>on</strong>, while finite verbs are fully inflected for<br />

pers<strong>on</strong> (first, sec<strong>on</strong>d, third), tense (present, past,<br />

future), aspect (simple, perfect, progressive), <strong>and</strong><br />

h<strong>on</strong>or (intimate, familiar, <strong>and</strong> formal), but not for<br />

number. C<strong>on</strong>diti<strong>on</strong>al, imperative, <strong>and</strong> other special<br />

inflecti<strong>on</strong>s for mood can replace the tense <strong>and</strong> aspect<br />

suffixes. The number of inflecti<strong>on</strong>s <strong>on</strong> many verb<br />

roots can total more than 200. A few example of<br />

<strong>Bangla</strong> verb inflexi<strong>on</strong> are given below in Table 1.<br />

Table 1: <strong>Verb</strong> inflecti<strong>on</strong>s<br />

<strong>Verb</strong><br />

Root<br />

к (1st<br />

pers<strong>on</strong>)<br />

<br />

(2nd<br />

Pers<strong>on</strong><br />

)<br />

хo<br />

(3rd<br />

pers<strong>on</strong>)<br />

2.2. <strong>oun</strong><br />

Present Past Future<br />

Simple C<strong>on</strong>tinuous Perfect Subjunctive Simple Habitual C<strong>on</strong>tinuous Perfect Simple Subjunctive<br />

к к к к к к к<br />

<br />

<br />

<br />

хo хoc хi хoк хo хo хoc хi<br />

<br />

к<br />

<br />

хo<br />

<br />

<br />

хoк<br />

Table 2: Singular n<strong>oun</strong> inflecti<strong>on</strong>s<br />

N<strong>oun</strong>s are inflected for case, including<br />

nominative, objective, genitive (possessive), <strong>and</strong><br />

locative. The case marking pattern for each n<strong>oun</strong><br />

being inflected depends <strong>on</strong> the n<strong>oun</strong>'s degree of<br />

animacy. When a definite article such as -a (singular)<br />

or -g (plural) is added, as in the Tables (2 <strong>and</strong> 3)<br />

below, n<strong>oun</strong>s are also inflected for number.<br />

Animate Inanimate<br />

Nominative ta a<br />

Objective taк a<br />

Genitive ta a


Working Papers 2004-2007<br />

Locative<br />

Animate<br />

Inanimate<br />

a<br />

Table 3: Plural n<strong>oun</strong> inflecti<strong>on</strong>s<br />

Animate<br />

Inanimate<br />

Nominative t g<br />

Objective t(к) g<br />

Genitive t g<br />

Locative<br />

3. Comp<strong>on</strong>ents<br />

3.1. Transliterati<strong>on</strong> file<br />

g<br />

The original PC-KIMMO software is written in C<br />

programming language <strong>and</strong> uses <strong>on</strong>ly Latin<br />

alphanumeric characters for input <strong>and</strong> output<br />

purposes. For inputs using scripts other than Latin, the<br />

user has to come up with his/her own transliterati<strong>on</strong><br />

scheme that uses Latin characters corresp<strong>on</strong>ding to<br />

characters of the n<strong>on</strong>-Latin script. Viewing <strong>and</strong><br />

underst<strong>and</strong>ing the input <strong>and</strong> output strings in such a<br />

way can be cumbersome <strong>and</strong> n<strong>on</strong>-intuitive for the<br />

user.<br />

JKimmo solves this problem in a modular,<br />

abstract fashi<strong>on</strong>. It requires that the whole<br />

transliterati<strong>on</strong> scheme be written down in a separate<br />

file. The user can then load that transliterati<strong>on</strong> file.<br />

Once the transliterati<strong>on</strong> file is loaded, the user can<br />

input strings <strong>and</strong> view output strings in his preferred<br />

language in an intuitive way. Transliterati<strong>on</strong> scheme<br />

for Bengali language is given in Table 4.<br />

3.2. Rule file<br />

Two level orthographic rules are required for<br />

JKimmo <strong>and</strong> PC-KIMMO. The general structure of<br />

the rules file is a list of declarati<strong>on</strong>s composed of a<br />

keyword followed by data. The set of valid keywords<br />

in a rules file includes COMMENT, ALPHABET,<br />

NULL, ANY, BOUNDARY, SUBSET, RULE, <strong>and</strong><br />

END. The COMMENT, SUBSET <strong>and</strong> RULE<br />

declarati<strong>on</strong>s are opti<strong>on</strong>al <strong>and</strong> also can be used more<br />

than <strong>on</strong>ce in a rules file. The END declarati<strong>on</strong> is also<br />

opti<strong>on</strong>al, but can <strong>on</strong>ly be used <strong>on</strong>ce. PC-KIMMO <strong>on</strong>ly<br />

recognizes Latin characters in rule file. To implement<br />

rule for language that uses other than Latin script we<br />

must follow the transliterati<strong>on</strong> scheme. There is a free<br />

rule compiler for PC-KIMMO called kgen is<br />

available. It takes rule specificati<strong>on</strong> <strong>and</strong> it generate<br />

rule for PC-KIMMO. There are more free tools<br />

available that can be used for rule generati<strong>on</strong>. A<br />

sample rules file has shown in Table 5.<br />

Table 4: Bengali transliterati<strong>on</strong> scheme<br />

<strong>Bangla</strong> Latin <strong>Bangla</strong> Latin <strong>Bangla</strong> Latin <strong>Bangla</strong> Latin <strong>Bangla</strong> Latin<br />

◌ ^ ◌ a G N r<br />

a A ◌ I G t l<br />

å F ◌ I ? T ш S<br />

i H ◌ u C d $<br />

и L ◌ U C D s<br />

u M ◌ R J n h<br />

Q ◌ e J p '<br />

V ◌ E Q P "<br />

e W ◌ o a V b Y<br />

X ◌ O W B ◌ %<br />

o Z к k X m ◌ &<br />

F х K Z y ◌ ~<br />

;MAIN RULE FILE BANGLAMORPHOLOGY.RUL<br />

Table 5: Sample rule file<br />

ALPHABET<br />

k K g G ? c C J q v w x z N t T d D n p P b B m y r l S $ s h ' " Y & ~ ^ a i I u U R e E o O A F H L M Q V W X Z f + j<br />

NULL 0<br />

ANY @<br />

45


Bengali<br />

BOUNDARY #<br />

SUBSET C<strong>on</strong>s k K g G ? c C j J q v w x z N t T d D n p P b B m y r l S $ s h ' " Y & ~ ^<br />

SUBSET KhaG<strong>on</strong> K d p D<br />

RULE "defaults" 1 31<br />

k K g G ? c C J q v w x z N t T d D n p P b B m y r l S $ s @<br />

k K g G ? c C J q v w x z N t T d D n p P b B m y r l S $ s @<br />

1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1<br />

RULE "defaults" 1 30<br />

h ' " Y & ~ ^ a i I u U R e E o O A F H L M Q V W X Z f + @<br />

h ' " Y & ~ ^ a i I u U R e E o O A F H L M Q V W X Z f 0 @<br />

1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1<br />

RULE " a:e


Working Papers 2004-2007<br />

word grammar rules. Associated with each rule are<br />

feature c<strong>on</strong>straints. A feature c<strong>on</strong>straint c<strong>on</strong>sists of<br />

two feature structures that must unify with each<br />

other. Feature c<strong>on</strong>straints have two functi<strong>on</strong>s: they<br />

c<strong>on</strong>strain the operati<strong>on</strong> of a rule <strong>and</strong> they pass<br />

features from <strong>on</strong>e node to another up the parse tree.<br />

A sample grammar file is given in Table 7.<br />

END<br />

4.2. Generati<strong>on</strong><br />

= <br />

= <br />

Sample generati<strong>on</strong> output is given in Figure 2.<br />

4. Results<br />

4.1. Recogniti<strong>on</strong><br />

Sample recogniti<strong>on</strong> output is given in Figure 1.<br />

Figure 2: Sample generati<strong>on</strong> output<br />

5. C<strong>on</strong>clusi<strong>on</strong><br />

This report presents <strong>Bangla</strong> verb <strong>and</strong> n<strong>oun</strong><br />

morphology <strong>and</strong> rules, lexic<strong>on</strong>s <strong>and</strong> grammar for<br />

inflecti<strong>on</strong>al verb <strong>and</strong> n<strong>oun</strong> morphology.<br />

6. References<br />

Figure 1: Sample recogniti<strong>on</strong> output<br />

Table 7: Sample grammar file<br />

;GRAMMER FILE BANGLAMORPHOLOGY.GRM<br />

Let vr be = VB<br />

Let sd be = SD<br />

Let pg be = PG<br />

Let gh be = GH<br />

Let an be = AN<br />

Let nt be = NT<br />

Let b be = B<br />

Let a be = A<br />

Let v be = V<br />

Let p be = P<br />

Let d be = D<br />

Let t be = T<br />

Let dtu be = DTU<br />

Let dgo be = DGO<br />

Let tgo be = TGO<br />

[1] M.Z. Islam <strong>and</strong> M. Khan, “JKimmo: A<br />

Multilingual Computati<strong>on</strong>al Morphology<br />

Framework for PC-KIMMO”, Proc. of 9th<br />

Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Computer <strong>and</strong><br />

Informati<strong>on</strong> Technology, ICCIT 2006, Dhaka,<br />

<strong>Bangla</strong>desh, 2006.<br />

[2] S. Dasgupta <strong>and</strong> M. Khan, “<strong>Morphological</strong><br />

Parsing of <strong>Bangla</strong> Words Using PC-KIMMO”, Proc.<br />

7th Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Computer an<br />

Informati<strong>on</strong> Technology, ICCIT 2004, Dhaka,<br />

<strong>Bangla</strong>desh, 2004.<br />

[3] S. Dasgupta <strong>and</strong> M. Khan, “Feature Unificati<strong>on</strong><br />

for <strong>Morphological</strong> Parsing in <strong>Bangla</strong>”, Proc. 7th<br />

Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Computer an<br />

Informati<strong>on</strong> Technology, ICCIT 2004, Dhaka,<br />

<strong>Bangla</strong>desh, 2004.<br />

[4] A.K.M. Morshed, Adunik Vasatatto - 2nd<br />

versi<strong>on</strong>, Noa Uddog, Kokata, 1997.<br />

Let SADHARON be =ASPECT<br />

Let GHOTOMAN be =ASPECT<br />

Let PURAGHOTITO be =ASPECT<br />

Let NITTOBRITTO be =ASPECT<br />

Let ANUGGA be =ASPECT<br />

RULE<br />

Word -> <strong>Verb</strong>root VSuffix1<br />

= <br />

= <br />

47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!