Hindi Part-of-Speech Tagging and Chunking : A Maximum Entropy ...

More documents

Recommendations

Info

identifying chunks and their labels is modeled inthe same way as that of identifying POS tags.In this paper, we present a statistical POS taggerand chunker for Hindi language. We havebuilt separate models for the same which satisfythe maximum entropy principle and can be used totag unseen text. Our system is tailored for NLPAI-ML contest 2006.This paper is organized as follows. Section 2gives an overview of maximum entropy models.Feature functions used in Hindi POS tagging andchunking are presented in section 3. Section 4 providesexperimental details and results.2 Maximum Entropy Markov ModelMaximum entropy (ME) principle states that theleast biased model which considers all known informationis the one which maximizes entropy.The ME technique builds a model which assumesnothing other than the imposed constraints. Tobuild such a model, we define feature functions. Afeature function is a boolean function which capturessome aspect of the language which is relevantto the sequence labelling task. An examplefeature function for POS tagging isf j (l | c) ={1 if current word is alphanumeric,0 otherwiseHere, l is one of the possible labels and c is thecontext 1 . The relationship between feature functionsand labels as evidenced in the training corpusis expressed as constraints. The probabilitydistribution satisfying these constraints and whichmakes no other assumptions has maximum entropy,is unique and can be expressed as (Bergeret al., 1996)⎛P r(l | c) = 1z(c) exp ⎝⎞k∑λ j f j (l, c) ⎠j=1where z(c) is a normalizing constant. The problemof estimating λ j parameters is solved by usingGeneralized Iterative Scaling(Darroch and Ratcliff,1972) algorithm. This learnt model is usedfor tagging unseen text. In our system, during tagging,Beam Search algorithm is applied to find themost promising label sequence.1 Context is a set of words surrounding the current wordand/or labels of previous words.3 Feature Functions3.1 POS tagging featuresFor the task of Hindi POS tagging, the main featurefunctions used in our system are listed below:Context-based features:From our empirical analysis, we found that acontext window of size four gives the best performance.For a word, the context consists of :• POS tag of previous word.• Combination of POS tags of previous twowords.• Current word.• Next word.Word features:Word features capture lexical and morphologicalproperties of the word being tagged. They are:• Suffixes : If the word suffix is same as a givensuffix.• Digits : Does the word have any digits, or isthe word completely numeric.• Special characters : Are there any specialcharacters like ‘-’ in the word.• Root of current word, or the next word (e. g.‘KaRa’)• English word: To handle English words thatoccasionally appear in Hindi text.Dictionary feature:This feature utilizes information present in a standardHindi dictionary. We define a feature functionfor each POS tag. For a POS tag l, if the wordbeing tagged can occur with label l according todictionary, then the corresponding feature is true.Corpus-based features: These features rely oninformation extracted from training corpus. Theyare:• Has the word occurred as proper noun intraining.• All possible tags of the current word, as seenin training.
• Has the word occurred with only a single tagin training corpus.• All possible tags of the next word, as seen intraining.0.90.890.88Accuracy v/s Training Data Size3.2 Chunking featuresThe main feature functions used in Hindi chunkingare listed below.Accuracy0.870.86Context-based features:For chunking, the most suitable context windowwas empirically found to consist of words, POStags and chunk labels of current word and twowords on either side of it. On the lines of (Singhet al., 2005), we found that for words having specificPOS tags (JJ, NN, VFM, PREP, SYM, QF,NEG and RP) adding current word, word and itsPOS tag combination as features reduces the performanceof chunker. We call such a POS tag asnonessential-word tag. For a word, the contextbasedfeatures consists of :• Current word and word, POS tag combination,if POS tag of current word is not in thelist of nonessential-word tags.• POS tags of all words in context, individually.• Combinations of POS tags of next two words,previous two words and current word, previousword, separately.• Chunk label of previous two words, independently.Current POS tag based features:For each tag, list of possible chunk labels for thattag are identified. These chunk labels are used asfeatures. Another feature based on POS tag of currentword utilizes what we call as tag class. POStags are classified into different groups based onthe most likely chunk label for that POS tag, asseen in training corpus. For example, all POStags which are most likely to occur in noun phraseare grouped under one class. The class of currentword’s POS tag is used as a feature.4 ExperimentsOur system is built for the NLPAI-ML task of POStagging Indian Languages. The tagset of the contestspecifies 29 POS tags and 6 chunk labels. Thedevelopment corpus for the task was provided by0.8555 60 65 70 75 80 85 90 95Training Data Size( % )Figure 1: POS tagging accuracy with varyingtraining - test data splitAccuracy0.920.90.880.860.84Accuracy across runschunking accuracyPOS tagging accuracy0.820 1 2 3 4 5 6 7 8 9RunFigure 2: Accuracy across runscontest organizers. We have conducted experimentsfor different split of training and test data.As can be seen in figure 1, POS tagging accuracyincreases with increase in proportion of trainingdata till it reaches 75%, after which there isa reduction in accuracy due to overfitting of thetrained model to training corpus. Beyond a splitof 85-15, increasing training corpus proportion increasesthe accuracy as the test corpus size becomesvery small. This prompted us to use a 75-25 split for training and test data in our experiments.The results were averaged out across differentruns, each time randomly picking trainingand test data. Figure 2 shows results using 75-25split of training and test data across 10 differentruns. Our chunker heavily depends on POS tagsand hence, in most cases its accuracy closely tailsthe POS tagging accuracy. The best POS taggingaccuracy of the system in these runs was foundto be 89.34% and the least accuracy was 87.04%.The average accuracy over 10 runs was 88.4%.For chunking, the best accuracy of chunk labels
Page 1: Hindi Part-of-Speech Tagging and Ch

Hindi Part-of-Speech Tagging and Chunking : A Maximum Entropy ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?