13.07.2015 Views

Part Of Speech Tagging and Chunking with HMM and CRF

Part Of Speech Tagging and Chunking with HMM and CRF

Part Of Speech Tagging and Chunking with HMM and CRF

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Part</strong> <strong>Of</strong> <strong>Speech</strong> <strong>Tagging</strong> <strong>and</strong> <strong>Chunking</strong> <strong>with</strong> <strong>HMM</strong> <strong>and</strong> <strong>CRF</strong>Pranjal AwasthiDept. of CSEIIT Madraspranjal@cse.iitm.ernet.inDelip RaoDept. of CSEIIT Madrasdelip@cse.iitm.ernet.inBalaraman RavindranDept. of CSEIIT Madrasravi@cs.iitm.ernet.inAbstractIn this paper we propose an approach to<strong>Part</strong> of <strong>Speech</strong> (PoS) tagging using a combinationof Hidden Markov Model <strong>and</strong> errordriven learning. For the NLPAI jointtask, we also implement a chunker usingConditional R<strong>and</strong>om Fields (<strong>CRF</strong>s). Theresults for the PoS tagging <strong>and</strong> chunkingtask are separately reported along <strong>with</strong> theresults of the joint task.1 Introduction<strong>Part</strong> of <strong>Speech</strong> tagging is an important preprocessingstep in many natural language processing applications<strong>and</strong> the first step in the syntactic analysisof a language. We propose a combinationof statistical <strong>and</strong> rule based technique for <strong>Part</strong> of<strong>Speech</strong> tagging of Indian languages <strong>and</strong> demonstrateits performance on a Hindi dataset. Shallowparsing or chunking is the task of segmentingtext into chunks of syntactically related wordgroups. Apart from reducing the search space ofdeep parsers, shallow parsing is very useful in applicationslike Named Entity Recognition, InformationExtraction, Summarization, Question Answering<strong>and</strong> Automatic Thesaurus Generation. Inthis paper, the task of chunking is attempted usingConditional R<strong>and</strong>om Fields <strong>and</strong> the results for thecombined <strong>and</strong> individual tasks are reported.2 The <strong>Part</strong> <strong>Of</strong> <strong>Speech</strong> TaggerOur tagging process consists of two stages, aninitial stochastic tagging using the TnT tagger(Brants, 2000), which is a second order HiddenMarkov Model (<strong>HMM</strong>) (Rabiner <strong>and</strong> Juang,1986) based tagger <strong>and</strong> a post-processing stage usingerror driven learning, akin to (Brill, 1995). Themain idea is as follows: Use the TnT tagger to performthe initial tagging <strong>and</strong> apply a set of transformationrules to correct the errors introduced bythe TnT tagger. These transformation rules are inducedduring the training phase by iteratively extractinga set of c<strong>and</strong>idate transformations fromthe transformation templates listed in Table 1 <strong>and</strong>selecting those transformations that maximize theerror reduction on the entire training data. Thusfor each iteration, a new training data is generatedby applying the transformation selected in that iteration.Table 1: Transformation templates for a givenword w i , from (Brill, 1995)Change tag a to b if:The previous word (w i−1 )is tagged zThe next word (w i+1 ) is tagged zThe word w i−2 is tagged zThe word w i+2 is tagged zThe word w i−1 or w i−2 is tagged zThe word w i+1 or w i+2 is tagged zThe word w i−1 or w i−2 or w i−3 is tagged zThe word w i+1 or w i+2 or w i+3 is tagged zThe previous word (w i−1 )is tagged z <strong>and</strong>the next word (w i+1 ) is tagged xThe previous word (w i−1 )is tagged z <strong>and</strong>w i−2 is tagged xThe next word (w i+1 )is tagged z <strong>and</strong>w i+2 is tagged x3 The ChunkerThe chunker implementation is a linear chain <strong>CRF</strong>provided by MALLET 1 . Our work on chunkingclosely follows the work by (Sha <strong>and</strong> Pereira,1 http://mallet.cs.umass.edu/index.php/Main Page


data r<strong>and</strong>omly into two halves. The first half isused to train the TnT tagger <strong>and</strong> the second half isused for testing. Any error in this process resultsin learning of appropriate transformation rules asexplained in Section 2. These transformation rulesare then used to correct the results produced bythe TnT tagger on the test set. We report the performancemeasures by averaging over five r<strong>and</strong>om50:50 splits of the training data. Table 4 <strong>and</strong> Table5 show the results before <strong>and</strong> after the applicationof transformation rules respectively.Table 5: <strong>Part</strong> <strong>Of</strong> <strong>Speech</strong> tagging <strong>with</strong>out error correctionPrecision Recall F β=1CC 95.54% 88.13% 91.69INTF 41.67% 100.00% 58.82JJ 42.71% 56.16% 48.52JVB 50.88% 38.67% 43.94NEG 97.83% 80.36% 88.24NLOC 71.43% 71.43% 71.43NN 69.46% 75.67% 72.43NNC 43.03% 37.97% 40.34NNP 58.96% 30.15% 39.90NVB 32.79% 43.96% 37.56PREP 95.78% 95.17% 95.47PRP 91.09% 96.35% 93.65QF 82.14% 70.77% 76.03QFN 87.74% 91.18% 89.42QW 89.47% 100.00% 94.44RB 63.38% 72.58% 67.67RBVB 0.00% 0.00% 0.00RP 85.90% 93.71% 89.63SYM 99.07% 98.34% 98.71UH 100.00% 100.00% 100.00VAUX 89.47% 86.79% 88.11VFM 81.24% 80.68% 80.96VJJ 20.00% 13.33% 16.00VNN 75.61% 88.57% 81.58VRB 79.41% 87.10% 83.08VV 0.00% 0.00% 0.00Overall 79.66% 79.66% 79.664.2 Chunk labelingWe train a linear chain <strong>CRF</strong> using the training data<strong>with</strong> 77620 features. The training process convergedafter 100 iterations of LBFGS. For detailsrefer (Lafferty et al., 2001). The model thus inducedis then used to label the test set. Table 6shows the results of chunking by using the POStags provided by the test set while Table 7 showsthe result of chunking using the POS tags generatedby our tagger.Table 6: <strong>Chunking</strong> <strong>with</strong> reference POS tagsPrecision Recall F β=1B-BLK 80.49% 76.39% 78.38B-JJP 100.00% 72.73% 84.21B-NP 87.25% 89.98% 88.60B-RBP 89.83% 96.36% 92.98B-VG 90.80% 89.43% 90.11I-BLK 30.00% 13.64% 18.75I-JJP 100.00% 38.46% 55.56I-NP 90.20% 91.23% 90.71I-RBP 72.73% 80.00% 76.19I-VG 94.26% 92.06% 93.15Overall 89.69% 89.69% 89.69Table 7: <strong>Chunking</strong> <strong>with</strong> generated POS tagsPrecision Recall F β=1B-BLK 80.33% 68.06% 73.68B-JJP 15.00% 13.64% 14.29B-NP 76.66% 83.10% 79.75B-RBP 60.94% 70.91% 65.55B-VG 64.75% 63.77% 64.26I-BLK 22.22% 9.09% 12.90I-JJP 14.29% 7.69% 10.00I-NP 84.34% 84.75% 84.54I-RBP 59.09% 65.00% 61.90I-VG 87.09% 81.06% 83.97Overall 79.58% 79.58% 79.585 Discussion of the ResultsWe obtain an overall F-measure of 79.66 using theTnT tagger. This low score could be attributedto the sparsity of the training data used. The useof transformations in post processing improves theoverall F-measure to 80.74. It should be noted thisresult is obtained as an average of the results obtainedusing five r<strong>and</strong>om 50:50 splits of the trainingdata as mentioned in Section 2. In order toclearly underst<strong>and</strong> the improvements obtained bytransformation based learning we show the differencein F-measures for the tags where the differenceis nonzero in Table 8. It is interesting tonote that the transformation rules improve the F-


Table 8: Changes in POS tagging F-score before<strong>and</strong> after the application of transformation rulesPOS tag Before After DifferenceJJ 48.52 51.23 2.71JVB 43.94 45.31 1.37NLOC 71.43 72.92 1.49NN 72.43 76.35 3.92NNC 40.34 18.99 -21.35NNP 39.9 41.13 1.23NVB 37.56 42.49 4.93PREP 95.47 95.67 0.20QFN 89.42 90.48 1.06RP 89.63 89.93 0.30SYM 98.71 98.8 0.09VAUX 88.11 88.82 0.71VFM 80.96 82.67 1.71VJJ 16 10.53 -5.47John Lafferty, Andrew McCallum, <strong>and</strong> Fern<strong>and</strong>oPereira. 2001. Conditional r<strong>and</strong>om fields: Probabilisticmodels for segmenting <strong>and</strong> labeling sequencedata. In Proceedings of the 18th InternationalConf. on Machine Learning, pages 282–289.Morgan Kaufmann, San Francisco, CA.L. R. Rabiner <strong>and</strong> B. H. Juang. 1986. An introductionto hidden markov models. IEEE ASSP Magazine,pages 4–16, January.Fei Sha <strong>and</strong> Fern<strong>and</strong>o C. N. Pereira. 2003. Shallowparsing <strong>with</strong> conditional r<strong>and</strong>om fields. In Proceedingsof HLT-NAACL.measures of all tags except for NNC <strong>and</strong> VJJ. A reductionin the F-measure could have been avoidedby selecting a richer set of transformation templatesthan those listed in Table 1 as the transformationbased learning process is highly sensitiveto the templates used.In general the chunking accuracy for the combinedtask is lesser than the chunking task alone<strong>with</strong> the reference tags. This is caused by thepropagation of errors introduced during the taggingprocess to the chunking stage.6 ConclusionWe have demonstrated the use of an off-the-shelfstatistical tagger combined <strong>with</strong> an error drivenlearning procedure for <strong>Part</strong> of <strong>Speech</strong> taggingHindi. The chunking task <strong>with</strong> Conditional R<strong>and</strong>omFields was also explored. We have reportedour results for each of these tasks separately <strong>and</strong>also the results for the joint task of POS tagging<strong>and</strong> chunking.ReferencesThorsten Brants. 2000. TnT–A statistical part-ofspeechtagger. In Proceedings of Association forNeuro-Linguistic Programming (ANLP), pages 224–231.Eric Brill. 1995. Transformation-based error-drivenlearning <strong>and</strong> natural language processing: A casestudy in part-of-speech tagging. Computational Linguistics,21(4):543–565.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!