A naÃ¯ve Bayes Classifier for Web Document Summarie...

More documents

Recommendations

Info

482 M. S. Pera & Y.-K. NgFig. 5. Accuracy ratios for CorSum and CorSum-SF generated summaries.classification process. Furthermore, on the average MNB performs classification atleast as fast as using the summaries generated by CorSum(-SF) than the LSA orTop-N summaries as shown in Figure 4.In Figure 5, we show the classification accuracy using MNB on CorSum(CorSum-SF, respectively) generated summaries for each natural class in20NG, and the corresponding labeled classes are (1) sci.electronics, (2)comp.sys.mac.hardware, (3) soc.religion.christian, (4) comp.windows.x, (5)comp.sys.ibm.pc.hardware,(6) comp.graphics, (7) misc.forsale,(8) rec.motorcycles,(9) comp.os.ms-windows.misc, (10) rec.sport.hockey, (11) talk.politics.misc, (12)alt.atheism, (13) sci.crypt, (14) talk.politics.guns, (15) rec.sport.baseball, (16)sci.space, (17) talk.politics.mideast, (18) sci.med, (19) rec.autos, and (20)talk.religion.misc. Regardless of the natural class considered, the accuracy achievedby MNB on CorSum-SF generated summaries outperforms the accuracy achievedon CorSum generated summaries.Figure 6 shows the number of false positives, which is the number of documentsassignedtoaclasswhentheyshouldnotbe,andfalse negatives,whichisthenumberof documents that were not assigned to a class to which they belong. We observethat except for classes 4, 5, 6, and 8, the average number of false positives foreach of the remaining classes in 20NG is 30, which constitutes approximately 12%of the classified news articles. The same applies to the number of false negatives.Except for classes 1, 11, 14, 16, and 18, the averagenumber of mislabeled articles is33, which constitutes approximately 13% of the articles used for the classificationpurpose. The overall average number of false positives and false negatives is 41, anaverage of 23%, per class.Figure 7 shows the number of false positives, and number of false negatives,for classifying the summarized documents generated by CorSum-SF in the 20NGdataset into the corresponding pre-defined classes. We observe that compared with
Classifying Summaries of Web Documents 483Fig. 6. False positives and false negatives of classifying CorSum generated summaries of eachclass in the 20NG dataset.Fig. 7. False positives and false negatives of classifying CorSum-SF generated summaries ofeach class in the 20NG dataset.CorSum, even though CorSum-SF yields the same average number of false positives,its averagenumber of false negatives is reduced, i.e., from 41 to 31, an averageof 25%.4.5. ImplementationCorSum(-SF) was implemented on an Intel Dual Core workstation with dual 2.66GHzprocessors,3GBRAMsize,andaharddiskof300GBrunningunderWindowsXP. The empirical studies conducted to assess the performance of CorSum(-SF),
Page 1 and 2: International Journal on Artificial
Page 3 and 4: Classifying Summaries of Web Docume
Page 17: Classifying Summaries of Web Docume

A naÃ¯ve Bayes Classifier for Web Document Summarie...

Create successful ePaper yourself

Delete template?

Save as template?