13.07.2015 Views

text classification using a concept of super- vised learning algorithm

text classification using a concept of super- vised learning algorithm

text classification using a concept of super- vised learning algorithm

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

International Journal <strong>of</strong> Advanced Computer Technology (IJACT)ISSN:2319-7900tecture, organization,logic,simulatorcomputer, architecture,6time,processorcomputer, architecture,2hard-ware, processor,complexcomputer, architecture,2hard-ware, processor,s<strong>of</strong>twarecomputer, architecture,2simula-tor, hardware,corecomputer, architecture,10systemcomputer, architecture,3cache,memorycomputer, architecture,10system,hardwarecomputer, architecture,9applica-tion, systemcomputer, architecture,16proces-sor, systemcomputer, architecture,4struc-ture, processorcomputer, architecture,2key,multiprocessor,systemcomputer, architecture,2key,simulation, simulatorcomputer, architecture,8memo-ry, processorcomputer, architecture,2hard-ware, system,s<strong>of</strong>twarecomputer, archi- 2tecture, system,digital, complexcomputer, architecture,system,3digitalvision, computer,image,4graphicscomputer,4graphics, image,colorcomputer,2graphics, design,display, artcomputer,3graphics, image,artcomputer,4graphics, image,designcomputer,12graphics, designcomputer,29graphics, imagegraphics, interactive12computer,14graphics, displaycomputer,4graphics, lightcomputer,7graphics, picturecomputer,5graphics, linegraphics, point,2vectorcomputer,2graphics, video,linedata, dynamic 16data, database 3data, bit, array 2Data, <strong>algorithm</strong> 27data, <strong>algorithm</strong>,4querydata, error, <strong>algorithm</strong>4data, knowledge 7data, queue, 258INTERNATIONAL JOURNAL OF ADVANCED COMPUTER TECHNOLOGY | VOLUME 2, NUMBER 2


International Journal <strong>of</strong> Advanced Computer Technology (IJACT)ISSN:2319-7900chemistry 778 9259 9259 9259 9259environment,chemistry,chemical0.0277780.0092590.0092590.0092590.009259environment,chemistrychemistry,chemical,moleculechemistry,paperchemistry,oxidechemistry,environment,paperarea, techniquequantum,paperenvironment,paper,science,laboratorytechnology,chemical,chemistryarea, environment,chemistryorganic,chemical,chemistrygas, plasma,chemistrycomputer,architecture,hardwarecomputer,architecture,organization,logic, simulatorcomputer,architecture,time,0.0277780.0277780.2685190.0925930.0740740.0370370.0462960.0277780.0555560.0370370.0462960.0094340.0094340.0094340.0094340.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.2358490.0283020.0660380.0283020.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0094340.0094340.0094340.0094340.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0094340.0094340.0094340.0094340.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0094340.0094340.0094340.009434processorcomputer,architecture,hardware,processor,complexcomputer,architecture,hardware,processor,s<strong>of</strong>twarecomputer,architecture,simulator,hardware,corecomputer,architecture,systemcomputer,architecture,cache,memorycomputer,architecture,system,hardwarecomputer,architecture,application,systemcomputer,architecture,processor,systemcomputer,architecture,structure,processor0.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0283020.0283020.1037740.0377360.1037740.094340.1603770.047170.0283020.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.0094340.009434computer, 0.009 0.02 0.00 0.00 0.00architecture,434 8302 9434 9434 9434key,multiprocessor,systemcomputer, 0.009 0.08 0.00 0.00 0.0060INTERNATIONAL JOURNAL OF ADVANCED COMPUTER TECHNOLOGY | VOLUME 2, NUMBER 2


International Journal <strong>of</strong> Advanced Computer Technology (IJACT)ISSN:2319-7900architecture,key,simulation,simulatorcomputer,architecture,memory,processorcomputer,architecture,hardware,system,s<strong>of</strong>twarecomputer,architecture,system,digital,complexcomputer,architecture,system,digitalvision,computer,image,graphicscomputer,graphics,image, colorcomputer,graphics,design, display,artcomputer,graphics,image, artcomputer,graphics,image, designcomputer,graphics,designcomputer,graphics,image434 4906 9434 9434 94340.0094340.0094340.0094340.0098040.0098040.0098040.0098040.0098040.0098040.0098040.0098040.0283020.0283020.0377360.0098040.0098040.0098040.0098040.0098040.0098040.0098040.0098040.0094340.0094340.0094340.049020.049020.0294120.0392160.049020.1274510.2941180.1274510.0094340.0094340.0094340.0098040.0098040.0098040.0098040.0098040.0098040.0098040.0098040.0094340.0094340.0094340.0098040.0098040.0098040.0098040.0098040.0098040.0098040.009804graphics, 0.009 0.00 0.14 0.00 0.00interactive 804 9804 7059 9804 9804computer, 0.009 0.00 0.04 0.00 0.00graphics,displaycomputer,graphics,lightcomputer,graphics,picturecomputer,graphics,linegraphics,point, vectorcomputer,graphics,video, linedata, dynamicdata, databasedata, bit,arrayData, <strong>algorithm</strong>data, <strong>algorithm</strong>,querydata, error,<strong>algorithm</strong>data, knowledgedata, queue,dynamicobject, data,programdata, method,treedata, tree,binarydata, program,effectivedata, analysis,complexity,<strong>algorithm</strong><strong>algorithm</strong>,data, efficientdata, tree,<strong>algorithm</strong>804 9804 902 9804 98040.0098040.0098040.0098040.0098040.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0098040.0098040.0098040.0098040.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0784310.0588240.0294120.0294120.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0098040.0098040.0098040.0098040.1634620.0384620.0288460.2692310.0480770.0480770.0769230.0288460.0384620.0288460.0769230.0288460.0384620.0480770.0673080.0288460.0098040.0098040.0098040.0098040.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.0096150.009615TEXT CLASSIFICATION USING A CONCEPT OF SUPERVISED LEARNING ALGORITHM61


International Journal <strong>of</strong> Advanced Computer Technology (IJACT)ISSN:2319-7900data, object,informationserver, internetinternet,service,userservice,semanticsemantic,domainsemantic,languageuser, s<strong>of</strong>twareXML, serviceserver, remoteinternet,domainsurvey,s<strong>of</strong>twareframework,semanticservice,semantic,frameworkservice,prototypes<strong>of</strong>tware,serviceuser, interne,s<strong>of</strong>twareservice,methodsemantic,internets<strong>of</strong>tware,servers<strong>of</strong>tware,server,Apachesource, internet0.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0092590.0555560.0462960.1388890.0648150.0740740.0740740.0370370.0462960.0370370.0277780.0833330.0462960.0370370.0648150.0648150.0462960.0462960.0740740.0277780.0277780.027778Comparative StudyIn this section we have tried to represent comparativepresentations in different point <strong>of</strong> views. We studiedthree thesis papers for the comparison purpose.Association Rule and Naïve Bayes ClassifierThe following results are found <strong>using</strong> the same data setsfor both Association Rule with Naive Bayes Classifierand proposed method. The result shows that proposedapproach work well <strong>using</strong> only 50% Training data.Table 5 Comparison <strong>of</strong> Association Rulewith Naïve Bayes Classifier and HybridMethod% <strong>of</strong>TrainingData% <strong>of</strong> AccuracyAssociationRule withNaive BayesClassifierHybridMethod10 40 3120 17 3630 42 5940 60 6750 32 81Here we derive the table about accuracy regardingdifferent test data sets and to see the Table 4.4 the lastpage <strong>of</strong> this paper.Figure 2 % <strong>of</strong> Training Data vs % <strong>of</strong> AccuracyLimitations and Future Work62INTERNATIONAL JOURNAL OF ADVANCED COMPUTER TECHNOLOGY | VOLUME 2, NUMBER 2


International Journal <strong>of</strong> Advanced Computer Technology (IJACT)ISSN:2319-7900As we have observed in this method better accuracyis found with increasing confidence up to 0.78. The<strong>algorithm</strong> will be more effective if the training set is setin such a way that it generates more sets. That is trainingset with all the different sections <strong>of</strong> total data cangive more dependable result. In this the computationaltime is too much <strong>using</strong> Apriori <strong>algorithm</strong>. Moreover, ifFrequent Pattern (FP) Growth tree could be formedtime would be shorter enough [7]. Though the experimentalresults are quite encouraging, it would be betterif we work with larger data sets with more classes.ConclusionHere, this paper presented an efficient and simpletechnique for <strong>text</strong> <strong>classification</strong>. The existing techniquesrequire more training data sets as well as the computationaltime <strong>of</strong> these techniques is also large. In contrastto the existing <strong>algorithm</strong>s, the proposed hybrid <strong>algorithm</strong>requires less training data and less computationaltime. Despite the randomly chosen training set weachieved 92% accuracy for 50% training data. Still theexperimental results are quite encouraging, it would bebetter if we work with larger data sets with more classes.AcknowledgmentsThe authors are thankful to IJACT Journal for thesupport to develop this document.References[1] S. M. Kamruzzaman, Farhana Haider, Ahmed RyadhHasan “Text Classification Using Data Minig”, ICTM2005.[2] Qasem A. Al-Radaideh, Eman Al Nagi “Using DataMining Techniques to Build a Classification Model forPredicting Employees Performance”, InternationalJournal <strong>of</strong> Advanced Computer Science and Applications(IJACSA),Vol. 3, No. 2, 2012[3] XindongWu, Vipin Kumar, J. Ross Quinlan, JoydeepGhosh, Qiang Yang, Hiroshi Motoda, Ge<strong>of</strong>frey J.McLachlan, Angus Ng, Bing Liu, Philip S. Yu, Zhi-HuaZhou, Michael Steinbach, David J. Hand, Dan Steinberg“Top 10 <strong>algorithm</strong>s in Data Mining”, Knowl Inf Syst(2008) 14:1–37 DOI 10.1007/s10115-007-0114-2[4] Thair Nu Phyu “Survey <strong>of</strong> Classification Techniquesin Data Mining”, Proceedings <strong>of</strong> the International MultiConference<strong>of</strong> Engineers and Computer Scientists 2009Vol I IMECS 2009, March 18 - 20, 2009, Hong Kong[5] Erhard Rahm, Hong Hai Do, “Data Cleaning: Problemsand Current Approaches”, IEEE Data Eng.Bull. 23(4):3--13 (2000), University <strong>of</strong> Leipzig, Germanyhttp://dbs.uni-leipzig.de[6] Chowdhury M<strong>of</strong>izur Rahman and Ferdous AhmedSohel and Parvez Naushad and S M Kamruzzaman, “Text Classification <strong>using</strong> the Concept <strong>of</strong> AssociationRule <strong>of</strong> Data Mining”, CoRR (2010)http://arxiv.org/ftp/arxiv/papers/1009/1009.4582.pdf[7]http://www.iasri.res.in/ebook/win_school_aa/notes/Data_Preprocessing.pdf (Last visited 24 Jan 2013)[8] Jiawei Han and Micheline Kamber, “Data Mining:Concepts and Techniques”, Second Edition Elsevier Inc.[9] Agarwal R., Mannila H., Srikant R., Toivonan H.,Verkamo, "A Fast Discovery <strong>of</strong> Association Rules,"Advances in Knowledge Discovery and Data Mining,1996.[10] Bing Liu, Wynne Hsu, Yiming Ma, "IntegratingClassification and Association Rule Mining," In Proceedings<strong>of</strong> the Fourth International Conference onKnowledge Discovery and Data Mining (KDD-98, PlenaryPresentation), New York, USA, 1998.[11] Canasai Kruengkrai , Chuleerat Jaruskulchai, "AParallel Learning Algorithm for Text Classification," TheEighth ACM SIGKDD International Conference onKnowledge Discovery and Data Mining (KDD-2002),Canada, July 2002.[12] Kamruzzaman S. M, Farhana Haider, "A HybridLearning Algoritm for Text Classification", Accepted forthe Publication <strong>of</strong> the Proceedings <strong>of</strong> the Third InternationalConference on Electrical and Computer Engineering,Going to be held at Dhaka, on December 28-30,2004.[13] Lewis, D., and Ringuette, M., "A Comparison <strong>of</strong>Two Learning Algorithms for Text Categorization," InThird Annual Symposium on Document Analysis andInformation Retrieval, pp. 81-93, 1994.[14] McCallum, A., and Nigam, K., "A Comparison <strong>of</strong>63TEXT CLASSIFICATION USING A CONCEPT OF SUPERVISED LEARNING ALGORITHM


International Journal <strong>of</strong> Advanced Computer Technology (IJACT)ISSN:2319-7900Events Models for Naïve Bayes Text Classification,"Papers from the AAAI Workshop, pp. 41-48, 1998.BiographiesKEYUR J PATEL received the B.E. degree in InformationTechnology from the University <strong>of</strong> HemchndracharyaNorth Gujarat, Patan, Gujarat, in 2011, theM.Tech. persuing in Information Technology from theUniversity <strong>of</strong> Ganpat, Kherva, Mehsana, Gujarat, Hisresearch areas include data mining and its complexityissues.KETAN J SARVAKAR received B.E. degree andM.Tech degree. He currently works in Ganpat University,Kherva,Mehsana, Gujarat as Assistant Pr<strong>of</strong>essor. Hisresearch areas include data mining and wireless sensor.TEJAS J PATEL received the B.E. degree in InformationTechnology from the University <strong>of</strong> HemchndracharyaNorth Gujarat, Patan, Gujarat, in 2011, theM.Tech. persuing in Information Technology from theUniversity <strong>of</strong> Ganpat, Kherva, Mehsana, Gujarat, Hisresearch areas include data mining, image processingand its complexity issues.64INTERNATIONAL JOURNAL OF ADVANCED COMPUTER TECHNOLOGY | VOLUME 2, NUMBER 2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!