Part Of Speech Tagging and Chunking with HMM and CRF

Part Of Speech Tagging and Chunking with HMM and CRFPranjal AwasthiDept. of CSEIIT Madraspranjal@cse.iitm.ernet.inDelip RaoDept. of CSEIIT Madrasdelip@cse.iitm.ernet.inBalaraman RavindranDept. of CSEIIT Madrasravi@cs.iitm.ernet.inAbstractIn this paper we propose an approach toPart of Speech (PoS) tagging using a combinationof Hidden Markov Model and errordriven learning. For the NLPAI jointtask, we also implement a chunker usingConditional Random Fields (CRFs). Theresults for the PoS tagging and chunkingtask are separately reported along with theresults of the joint task.1 IntroductionPart of Speech tagging is an important preprocessingstep in many natural language processing applicationsand the first step in the syntactic analysisof a language. We propose a combinationof statistical and rule based technique for Part ofSpeech tagging of Indian languages and demonstrateits performance on a Hindi dataset. Shallowparsing or chunking is the task of segmentingtext into chunks of syntactically related wordgroups. Apart from reducing the search space ofdeep parsers, shallow parsing is very useful in applicationslike Named Entity Recognition, InformationExtraction, Summarization, Question Answeringand Automatic Thesaurus Generation. Inthis paper, the task of chunking is attempted usingConditional Random Fields and the results for thecombined and individual tasks are reported.2 The Part Of Speech TaggerOur tagging process consists of two stages, aninitial stochastic tagging using the TnT tagger(Brants, 2000), which is a second order HiddenMarkov Model (HMM) (Rabiner and Juang,1986) based tagger and a post-processing stage usingerror driven learning, akin to (Brill, 1995). Themain idea is as follows: Use the TnT tagger to performthe initial tagging and apply a set of transformationrules to correct the errors introduced bythe TnT tagger. These transformation rules are inducedduring the training phase by iteratively extractinga set of candidate transformations fromthe transformation templates listed in Table 1 andselecting those transformations that maximize theerror reduction on the entire training data. Thusfor each iteration, a new training data is generatedby applying the transformation selected in that iteration.Table 1: Transformation templates for a givenword w i , from (Brill, 1995)Change tag a to b if:The previous word (w i−1 )is tagged zThe next word (w i+1 ) is tagged zThe word w i−2 is tagged zThe word w i+2 is tagged zThe word w i−1 or w i−2 is tagged zThe word w i+1 or w i+2 is tagged zThe word w i−1 or w i−2 or w i−3 is tagged zThe word w i+1 or w i+2 or w i+3 is tagged zThe previous word (w i−1 )is tagged z andthe next word (w i+1 ) is tagged xThe previous word (w i−1 )is tagged z andw i−2 is tagged xThe next word (w i+1 )is tagged z andw i+2 is tagged x3 The ChunkerThe chunker implementation is a linear chain CRFprovided by MALLET 1 . Our work on chunkingclosely follows the work by (Sha and Pereira,1 http://mallet.cs.umass.edu/index.php/Main Page

data randomly into two halves. The first half isused to train the TnT tagger and the second half isused for testing. Any error in this process resultsin learning of appropriate transformation rules asexplained in Section 2. These transformation rulesare then used to correct the results produced bythe TnT tagger on the test set. We report the performancemeasures by averaging over five random50:50 splits of the training data. Table 4 and Table5 show the results before and after the applicationof transformation rules respectively.Table 5: Part Of Speech tagging without error correctionPrecision Recall F β=1CC 95.54% 88.13% 91.69INTF 41.67% 100.00% 58.82JJ 42.71% 56.16% 48.52JVB 50.88% 38.67% 43.94NEG 97.83% 80.36% 88.24NLOC 71.43% 71.43% 71.43NN 69.46% 75.67% 72.43NNC 43.03% 37.97% 40.34NNP 58.96% 30.15% 39.90NVB 32.79% 43.96% 37.56PREP 95.78% 95.17% 95.47PRP 91.09% 96.35% 93.65QF 82.14% 70.77% 76.03QFN 87.74% 91.18% 89.42QW 89.47% 100.00% 94.44RB 63.38% 72.58% 67.67RBVB 0.00% 0.00% 0.00RP 85.90% 93.71% 89.63SYM 99.07% 98.34% 98.71UH 100.00% 100.00% 100.00VAUX 89.47% 86.79% 88.11VFM 81.24% 80.68% 80.96VJJ 20.00% 13.33% 16.00VNN 75.61% 88.57% 81.58VRB 79.41% 87.10% 83.08VV 0.00% 0.00% 0.00Overall 79.66% 79.66% 79.664.2 Chunk labelingWe train a linear chain CRF using the training datawith 77620 features. The training process convergedafter 100 iterations of LBFGS. For detailsrefer (Lafferty et al., 2001). The model thus inducedis then used to label the test set. Table 6shows the results of chunking by using the POStags provided by the test set while Table 7 showsthe result of chunking using the POS tags generatedby our tagger.Table 6: Chunking with reference POS tagsPrecision Recall F β=1B-BLK 80.49% 76.39% 78.38B-JJP 100.00% 72.73% 84.21B-NP 87.25% 89.98% 88.60B-RBP 89.83% 96.36% 92.98B-VG 90.80% 89.43% 90.11I-BLK 30.00% 13.64% 18.75I-JJP 100.00% 38.46% 55.56I-NP 90.20% 91.23% 90.71I-RBP 72.73% 80.00% 76.19I-VG 94.26% 92.06% 93.15Overall 89.69% 89.69% 89.69Table 7: Chunking with generated POS tagsPrecision Recall F β=1B-BLK 80.33% 68.06% 73.68B-JJP 15.00% 13.64% 14.29B-NP 76.66% 83.10% 79.75B-RBP 60.94% 70.91% 65.55B-VG 64.75% 63.77% 64.26I-BLK 22.22% 9.09% 12.90I-JJP 14.29% 7.69% 10.00I-NP 84.34% 84.75% 84.54I-RBP 59.09% 65.00% 61.90I-VG 87.09% 81.06% 83.97Overall 79.58% 79.58% 79.585 Discussion of the ResultsWe obtain an overall F-measure of 79.66 using theTnT tagger. This low score could be attributedto the sparsity of the training data used. The useof transformations in post processing improves theoverall F-measure to 80.74. It should be noted thisresult is obtained as an average of the results obtainedusing five random 50:50 splits of the trainingdata as mentioned in Section 2. In order toclearly understand the improvements obtained bytransformation based learning we show the differencein F-measures for the tags where the differenceis nonzero in Table 8. It is interesting tonote that the transformation rules improve the F-

Table 8: Changes in POS tagging F-score beforeand after the application of transformation rulesPOS tag Before After DifferenceJJ 48.52 51.23 2.71JVB 43.94 45.31 1.37NLOC 71.43 72.92 1.49NN 72.43 76.35 3.92NNC 40.34 18.99 -21.35NNP 39.9 41.13 1.23NVB 37.56 42.49 4.93PREP 95.47 95.67 0.20QFN 89.42 90.48 1.06RP 89.63 89.93 0.30SYM 98.71 98.8 0.09VAUX 88.11 88.82 0.71VFM 80.96 82.67 1.71VJJ 16 10.53 -5.47John Lafferty, Andrew McCallum, and FernandoPereira. 2001. Conditional random fields: Probabilisticmodels for segmenting and labeling sequencedata. In Proceedings of the 18th InternationalConf. on Machine Learning, pages 282–289.Morgan Kaufmann, San Francisco, CA.L. R. Rabiner and B. H. Juang. 1986. An introductionto hidden markov models. IEEE ASSP Magazine,pages 4–16, January.Fei Sha and Fernando C. N. Pereira. 2003. Shallowparsing with conditional random fields. In Proceedingsof HLT-NAACL.measures of all tags except for NNC and VJJ. A reductionin the F-measure could have been avoidedby selecting a richer set of transformation templatesthan those listed in Table 1 as the transformationbased learning process is highly sensitiveto the templates used.In general the chunking accuracy for the combinedtask is lesser than the chunking task alonewith the reference tags. This is caused by thepropagation of errors introduced during the taggingprocess to the chunking stage.6 ConclusionWe have demonstrated the use of an off-the-shelfstatistical tagger combined with an error drivenlearning procedure for Part of Speech taggingHindi. The chunking task with Conditional RandomFields was also explored. We have reportedour results for each of these tasks separately andalso the results for the joint task of POS taggingand chunking.ReferencesThorsten Brants. 2000. TnT–A statistical part-ofspeechtagger. In Proceedings of Association forNeuro-Linguistic Programming (ANLP), pages 224–231.Eric Brill. 1995. Transformation-based error-drivenlearning and natural language processing: A casestudy in part-of-speech tagging. Computational Linguistics,21(4):543–565.

Part Of Speech Tagging and Chunking with HMM and CRF

Create successful ePaper yourself

Delete template?

Save as template?