Part Of Speech Tagging and Chunking with HMM and CRF - LTRC

More documents

Recommendations

Info

2003). Let Y denote the chunk label sequence andX denote the corresponding observation sequence.A linear chain CRF (Lafferty et al., 2001) modelsthe conditional probability P(Y|X) as∑P(Y|X) =Z 1 exp{∑ k+ ∑ ki λ kf k (y i−1 ,y i , X)∑i µ kg k (y i , X)}The feature set {f, g} used in this work is similarto that used in (Sha and Pereira, 2003). Table 2describes the feature set. t i and c i is the PoS tagand the chunk tag respectively for the word w i .Table 2: Feature set used for the chunkerw i−2 = ww i−1 = ww i = ww i+1 = ww i+2 = ww i−1 = w ′ , w i = ww i+1 = w ′ , w i = wt i−2 = tt i−1 = tt i = tt i+1 = tt i+2 = tt i−1 = t ′ , t i = tt i−2 = t ′ , t i−1 = tt i = t ′ , t i+1 = tt i+1 = t ′ , t i+2 = tt i−2 = t ′′ , t i−1 = t ′ , t i = tt i−1 = t ′′ , t i = t ′ , t i+1 = tt i = t ′′ , t i+1 = t ′ , t i+2 = tc i−1 = cTable 3: Part of Speech tagging resultsModel Precision Recall F β=1CRF 69.40 69.40 69.40TnT 78.94 78.94 78.94TnT+TBL 80.74 80.74 80.744 Experiments and ResultsWe use the limited category dataset supplied forthe NLPAI shared task 2 for training our PoS taggerand chunker. This is referred as the “trainingdata” unless explicitly stated otherwise. We reportthe precision, recall and F1 scores for each of2 http://ltrc.iiit.net/nlpai contest06/Table 4: Part Of Speech tagging with error correctionPrecision Recall F β=1CC 95.54% 88.13% 91.69INTF 41.67% 100.00% 58.82JJ 46.63% 56.85% 51.23JVB 54.72% 38.67% 45.31NEG 97.83% 80.36% 88.24NLOC 74.47% 71.43% 72.92NN 70.79% 82.85% 76.35NNC 18.89% 19.10% 18.99NNP 62.99% 30.53% 41.13NNPC 46.34% 19.39% 27.34NVB 40.20% 45.05% 42.49PREP 95.91% 95.43% 95.67PRP 91.09% 96.35% 93.65QF 82.14% 70.77% 76.03QFN 87.96% 93.14% 90.48QW 89.47% 100.00% 94.44RB 63.38% 72.58% 67.67RBVB 0.00% 0.00% 0.00RP 86.45% 93.71% 89.93SYM 99.07% 98.53% 98.80UH 100.00% 100.00% 100.00VAUX 90.62% 87.09% 88.82VFM 82.21% 83.14% 82.67VJJ 25.00% 6.67% 10.53VNN 75.61% 88.57% 81.58VRB 79.41% 87.10% 83.08VV 0.00% 0.00% 0.00Overall 80.74% 80.74% 80.74these tasks on the testing set. The reported resultswere derived from a modified CONLL evaluationscript 3 for the same task.4.1 Part of Speech TaggingWe tried the Part of Speech tagging task usingConditional Random Fields (CRFs) (Lafferty etal., 2001) using w i−1 , w i−1 w i , w i+1 , and w i w i+1as features for the current word w i . In addition toCRFs we also tried Brant’s TnT tagger that usesHidden Markov Models (HMM). From Table 3 itis clear that TnT outperforms CRF in the currentPoS tagging task. This is probably due to the largenumber of output labels for the task (26 PoS tags)and relatively small amount of training data.Part of Speech tagging for the final system wasperformed as follows. First we split the training3 http://www.cnts.ua.ac.be/conll2000/chunking/conlleval.txt
data randomly into two halves. The first half isused to train the TnT tagger and the second half isused for testing. Any error in this process resultsin learning of appropriate transformation rules asexplained in Section 2. These transformation rulesare then used to correct the results produced bythe TnT tagger on the test set. We report the performancemeasures by averaging over five random50:50 splits of the training data. Table 4 and Table5 show the results before and after the applicationof transformation rules respectively.Table 5: Part Of Speech tagging without error correctionPrecision Recall F β=1CC 95.54% 88.13% 91.69INTF 41.67% 100.00% 58.82JJ 42.71% 56.16% 48.52JVB 50.88% 38.67% 43.94NEG 97.83% 80.36% 88.24NLOC 71.43% 71.43% 71.43NN 69.46% 75.67% 72.43NNC 43.03% 37.97% 40.34NNP 58.96% 30.15% 39.90NVB 32.79% 43.96% 37.56PREP 95.78% 95.17% 95.47PRP 91.09% 96.35% 93.65QF 82.14% 70.77% 76.03QFN 87.74% 91.18% 89.42QW 89.47% 100.00% 94.44RB 63.38% 72.58% 67.67RBVB 0.00% 0.00% 0.00RP 85.90% 93.71% 89.63SYM 99.07% 98.34% 98.71UH 100.00% 100.00% 100.00VAUX 89.47% 86.79% 88.11VFM 81.24% 80.68% 80.96VJJ 20.00% 13.33% 16.00VNN 75.61% 88.57% 81.58VRB 79.41% 87.10% 83.08VV 0.00% 0.00% 0.00Overall 79.66% 79.66% 79.664.2 Chunk labelingWe train a linear chain CRF using the training datawith 77620 features. The training process convergedafter 100 iterations of LBFGS. For detailsrefer (Lafferty et al., 2001). The model thus inducedis then used to label the test set. Table 6shows the results of chunking by using the POStags provided by the test set while Table 7 showsthe result of chunking using the POS tags generatedby our tagger.Table 6: Chunking with reference POS tagsPrecision Recall F β=1B-BLK 80.49% 76.39% 78.38B-JJP 100.00% 72.73% 84.21B-NP 87.25% 89.98% 88.60B-RBP 89.83% 96.36% 92.98B-VG 90.80% 89.43% 90.11I-BLK 30.00% 13.64% 18.75I-JJP 100.00% 38.46% 55.56I-NP 90.20% 91.23% 90.71I-RBP 72.73% 80.00% 76.19I-VG 94.26% 92.06% 93.15Overall 89.69% 89.69% 89.69Table 7: Chunking with generated POS tagsPrecision Recall F β=1B-BLK 80.33% 68.06% 73.68B-JJP 15.00% 13.64% 14.29B-NP 76.66% 83.10% 79.75B-RBP 60.94% 70.91% 65.55B-VG 64.75% 63.77% 64.26I-BLK 22.22% 9.09% 12.90I-JJP 14.29% 7.69% 10.00I-NP 84.34% 84.75% 84.54I-RBP 59.09% 65.00% 61.90I-VG 87.09% 81.06% 83.97Overall 79.58% 79.58% 79.585 Discussion of the ResultsWe obtain an overall F-measure of 79.66 using theTnT tagger. This low score could be attributedto the sparsity of the training data used. The useof transformations in post processing improves theoverall F-measure to 80.74. It should be noted thisresult is obtained as an average of the results obtainedusing five random 50:50 splits of the trainingdata as mentioned in Section 2. In order toclearly understand the improvements obtained bytransformation based learning we show the differencein F-measures for the tags where the differenceis nonzero in Table 8. It is interesting tonote that the transformation rules improve the F-
Page 1: Part Of Speech Tagging and Chunking

Part Of Speech Tagging and Chunking with HMM and CRF - LTRC

Create successful ePaper yourself

Delete template?

Save as template?