fuzzy clustering in web text mining and its application - ijcsmr
fuzzy clustering in web text mining and its application - ijcsmr
fuzzy clustering in web text mining and its application - ijcsmr
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
International Journal of Computer Science <strong>and</strong> Management Research Vol 2 Issue 2 February 2013<br />
ISSN 2278-733X<br />
Abstract<br />
FUZZY CLUSTERING IN WEB TEXT MINING<br />
AND ITS APPLICATION IN IEEE ABSTRACT<br />
CLASSIFICATION<br />
Text M<strong>in</strong><strong>in</strong>g, a branch of computer science [1], is the process of<br />
extract<strong>in</strong>g patterns from large data sets by comb<strong>in</strong><strong>in</strong>g methods from<br />
statistics <strong>and</strong> artificial <strong>in</strong>telligence with database management. Text<br />
M<strong>in</strong><strong>in</strong>g is seen as an <strong>in</strong>creas<strong>in</strong>gly important tool by modern bus<strong>in</strong>ess to<br />
transform data <strong>in</strong>to bus<strong>in</strong>ess <strong>in</strong>telligence giv<strong>in</strong>g an <strong>in</strong>formational<br />
advantage. Web <strong>text</strong> retrieval refers to <strong>text</strong> retrieval techniques applied<br />
to Web resources <strong>and</strong> literature available on the Web. The volume of<br />
published Web research, <strong>and</strong> therefore the underly<strong>in</strong>g Web knowledge<br />
base, is exp<strong>and</strong><strong>in</strong>g at an <strong>in</strong>creas<strong>in</strong>g rate. Web <strong>text</strong> retrieval is a way to<br />
aid researchers <strong>in</strong> cop<strong>in</strong>g with <strong>in</strong>formation overload. By discover<strong>in</strong>g<br />
predictive relationships between different pieces of extracted data,<br />
data-m<strong>in</strong><strong>in</strong>g algorithms can be used to improve the accuracy of<br />
<strong>in</strong>formation extraction. However, <strong>text</strong>ual variation due to typos,<br />
abbreviations, <strong>and</strong> other sources can prevent the productive discovery<br />
<strong>and</strong> utilization of hard-match<strong>in</strong>g rules. Recent methods of soft<br />
<strong>cluster<strong>in</strong>g</strong> can exploit predictive relationships <strong>in</strong> <strong>text</strong>ual data. This<br />
paper presents a technique for us<strong>in</strong>g soft <strong>cluster<strong>in</strong>g</strong> Text M<strong>in</strong><strong>in</strong>g<br />
algorithm to <strong>in</strong>crease the accuracy of Web <strong>text</strong> extraction.<br />
Experimental results demonstrate that this approach improves <strong>text</strong><br />
extraction more effectively that hard keyword match<strong>in</strong>g rules.<br />
Keywords—Fuzzy cluster,TextM<strong>in</strong><strong>in</strong>g,Webm<strong>in</strong><strong>in</strong>g,document Cluster<strong>in</strong>g<br />
I. INTRODUCTION<br />
Search<strong>in</strong>g the most similar documents to a given one is crucial <strong>in</strong><br />
<strong>text</strong> m<strong>in</strong><strong>in</strong>g because it is the basic process of many techniques such<br />
as classification or <strong>in</strong>formation retrieval. Two of the major issues<br />
that <strong>text</strong> m<strong>in</strong><strong>in</strong>g faces are the large amount of documents, millions<br />
<strong>in</strong> modest cases, <strong>and</strong> a very high dimensionality of the featured<br />
space. Text documents are usually represented as vectors, where<br />
each dimension corresponds to a term <strong>and</strong> the value reflects <strong>its</strong><br />
Rahul R.Papalkar #1 , Gajendras<strong>in</strong>gh Ch<strong>and</strong>el *2 ,<br />
#Department of <strong>in</strong>formation Tehnology, SSICT, Sehore,(M.P.)India<br />
* Department of <strong>in</strong>formation Tehnology, SSICT, Sehore,(M.P.)India<br />
importance <strong>in</strong> the document. There are many approaches to f<strong>in</strong>d the<br />
exact vic<strong>in</strong>ity of an object. However, they suffer the curse of<br />
dimensionality, that is, their performance drastically decreases as<br />
the number of dimensions grows. This problem prevents <strong>its</strong><br />
<strong>application</strong> <strong>in</strong> <strong>text</strong> m<strong>in</strong><strong>in</strong>g. To avoid the curse of dimensionality a<br />
variety of methods based on <strong>in</strong>exact search<strong>in</strong>g have been pro-posed.<br />
In [1, 2] a probabilistic technique, with a good performance, was<br />
presented. This solution uses some elements of the tra<strong>in</strong><strong>in</strong>g set as<br />
pivots or permutants. Basically, the permutants are used to predict<br />
proximity between elements <strong>and</strong> to reduce the number of real<br />
distance evaluation at query time. Although this method has a good<br />
performance when search<strong>in</strong>g proximities over documents, it<br />
<strong>in</strong>troduces an overload at search time, due to the necessity to<br />
perform a sequential search over permutants [1], or to use an<br />
auxiliary structure to avoid it [2]. This overload <strong>in</strong>creases when the<br />
space dimension or the size of datasets grows. In this paper we<br />
<strong>in</strong>troduce improvements to our access method for <strong>in</strong>dex<strong>in</strong>g<br />
collections of objects represent<strong>in</strong>g a very high-dimensional space<br />
presented <strong>in</strong> [3].For <strong>in</strong>dex<strong>in</strong>g, this method uses a comb<strong>in</strong>ation of a<br />
graph structure <strong>and</strong> pivots (used as entry po<strong>in</strong>ts), <strong>and</strong> a very fast<br />
search algorithm that uses distance or similarity based measures <strong>in</strong><br />
order to obta<strong>in</strong> the k-nearest neighbors (knn) of novel query objects.<br />
In this paper, we <strong>in</strong>troduce a new fast way to generate the connected<br />
graph <strong>and</strong> a prune rule to improve searches. Although the time<br />
required generat<strong>in</strong>g the <strong>in</strong>dex structure grows with the size of<br />
collection of objects used, this process is carried out only once<br />
(offl<strong>in</strong>e) <strong>and</strong> does not affect the query process.<br />
1529<br />
Rahul R.Papalkar et.al. www.<strong>ijcsmr</strong>.org
International Journal of Computer Science <strong>and</strong> Management Research Vol 2 Issue 2 February 2013<br />
ISSN 2278-733X<br />
II. Literature Survey<br />
process<strong>in</strong>g is to discover important features from raw data. Data<br />
Current Text M<strong>in</strong><strong>in</strong>g tools operate on structured data, the k<strong>in</strong>d of<br />
data that resides <strong>in</strong> large relational databases whereas data <strong>in</strong> the<br />
multimedia databases are semi-structured or unstructured. Often<br />
compared with <strong>text</strong> m<strong>in</strong><strong>in</strong>g, multimedia m<strong>in</strong><strong>in</strong>g reaches much<br />
higher complexity result<strong>in</strong>g from: a) The huge volume of data, b)<br />
The variability <strong>and</strong> heterogeneity of the multimedia data (e.g.<br />
diversity of sensors, time or conditions of acquisition etc) <strong>and</strong> c)<br />
The multimedia content’s mean<strong>in</strong>g is subjective [6].<br />
Unstructured data<br />
Unstructured data is simply a bit stream. Examples <strong>in</strong>clude pixel<br />
level representation for images, video, <strong>and</strong> audio, <strong>and</strong> character<br />
level representation for <strong>text</strong>. Substantial process<strong>in</strong>g <strong>and</strong><br />
<strong>in</strong>terpretation are required to extract semantics from unstructured<br />
data [7]. This k<strong>in</strong>d of data is not broken down <strong>in</strong>to smaller logical<br />
structures <strong>and</strong> is not typically <strong>in</strong>terpreted by the database<br />
Architectures for Multimedia Text M<strong>in</strong><strong>in</strong>g<br />
Various architectures are be<strong>in</strong>g exam<strong>in</strong>ed to design <strong>and</strong> develop a<br />
multimedia Text M<strong>in</strong><strong>in</strong>g system. The first architecture <strong>in</strong>cludes the<br />
follow<strong>in</strong>g. Extract data or metadata from the unstructured database.<br />
Store the extracted data <strong>in</strong> a structured database <strong>and</strong> apply Text<br />
M<strong>in</strong><strong>in</strong>gtools on the structured database [8]. This is illustrated <strong>in</strong><br />
figure 2.1.<br />
Figure 2.1 Convert<strong>in</strong>g unstructured data to structured data for m<strong>in</strong><strong>in</strong>g<br />
Figure 2.1 present architecture of apply<strong>in</strong>g multimedia m<strong>in</strong><strong>in</strong>g <strong>in</strong><br />
different multimedia types [18]. Data collection is the start<strong>in</strong>g po<strong>in</strong>t<br />
of a learn<strong>in</strong>g system, as the quality of raw data determ<strong>in</strong>es the<br />
overall achievable performance. Then, the goal of data pre-<br />
pre-process<strong>in</strong>g <strong>in</strong>cludes data clean<strong>in</strong>g, normalization,<br />
transformation, feature selection, etc. Learn<strong>in</strong>g can be<br />
straightforward, if <strong>in</strong>formative features can be identified at pre-<br />
process<strong>in</strong>g stage. Detailed procedure depends highly on the nature<br />
of raw data <strong>and</strong> problem’s doma<strong>in</strong>. In some cases, prior knowledge<br />
can be extremely valuable.<br />
Figure 2.2 Multimedia M<strong>in</strong><strong>in</strong>g Process<br />
For many systems, this stage is still primarily conducted by doma<strong>in</strong><br />
experts. The product of data pre-process<strong>in</strong>g is the tra<strong>in</strong><strong>in</strong>g set. Given<br />
a tra<strong>in</strong><strong>in</strong>g set, a learn<strong>in</strong>g model has to be chosen to learn from it. It<br />
must be mentioned that the steps of multimedia m<strong>in</strong><strong>in</strong>g are often<br />
iterative. The analyst can also jump back <strong>and</strong> forth between major<br />
tasks <strong>in</strong> order to improve the results [6].<br />
Figure 2.3 present architecture of apply<strong>in</strong>g multimedia m<strong>in</strong><strong>in</strong>g <strong>in</strong> different<br />
multimedia types<br />
Figure 2.3 present architecture of apply<strong>in</strong>g multimedia m<strong>in</strong><strong>in</strong>g <strong>in</strong><br />
different multimedia types [5]. Here the ma<strong>in</strong> stages of the Text<br />
M<strong>in</strong><strong>in</strong>g process are (1) doma<strong>in</strong> underst<strong>and</strong><strong>in</strong>g; (2) data selection; (3)<br />
lean<strong>in</strong>g<br />
1530<br />
<strong>and</strong> preprocess<strong>in</strong>g; (4) discover<strong>in</strong>g patterns ;(5)<br />
Rahul R.Papalkar et.al. www.<strong>ijcsmr</strong>.org
International Journal of Computer Science <strong>and</strong> Management Research Vol 2 Issue 2 February 2013<br />
ISSN 2278-733X<br />
<strong>in</strong>terpretation; <strong>and</strong> (6) report<strong>in</strong>g <strong>and</strong> us<strong>in</strong>g discovered knowledge.<br />
The doma<strong>in</strong> underst<strong>and</strong><strong>in</strong>g stage requires learn<strong>in</strong>g how the results of<br />
data-m<strong>in</strong><strong>in</strong>g will be used so as to gather all relevant prior<br />
knowledge before m<strong>in</strong><strong>in</strong>g.<br />
II. METHODOLOGY<br />
This process is done <strong>in</strong> three steps: <strong>in</strong>formation retrieval,<br />
<strong>in</strong>formation extraction <strong>and</strong> <strong>text</strong> m<strong>in</strong><strong>in</strong>g. A primary reason for<br />
us<strong>in</strong>g Text M<strong>in</strong><strong>in</strong>g for <strong>web</strong> <strong>text</strong> is to assist <strong>in</strong> the analysis of<br />
collections of the available <strong>web</strong> <strong>text</strong>. Web data is vulnerable to co<br />
l<strong>in</strong>earity because of unknown <strong>in</strong>terrelations. The analysis <strong>in</strong> this<br />
paper will be augmented by us<strong>in</strong>g experiment-based approach.<br />
Before Text M<strong>in</strong><strong>in</strong>g algorithms can be used, a target data set will be<br />
assembled. As Text M<strong>in</strong><strong>in</strong>g can only uncover patterns already<br />
present <strong>in</strong> the data, the target dataset must be large enough to<br />
conta<strong>in</strong> these patterns. Pre-process is essential to analyze the<br />
multivariate datasets before <strong>cluster<strong>in</strong>g</strong> or <strong>text</strong> m<strong>in</strong><strong>in</strong>g. The target set<br />
is then cleaned. Clean<strong>in</strong>g removes the observations with noise <strong>and</strong><br />
miss<strong>in</strong>g data. The <strong>web</strong> data available with us is first put <strong>in</strong>to a data<br />
warehouse. Before putt<strong>in</strong>g the data <strong>in</strong> the data warehouse the<br />
keyword extraction algorithm is used to f<strong>in</strong>d out the keywords from<br />
the full <strong>text</strong>. This keyword extraction uses partial parser to extract<br />
entity names. This parser uses l<strong>in</strong>guistic rules <strong>and</strong> statistical<br />
disambiguity to achieve greater precision. The data is then<br />
organized <strong>in</strong>to clusters. Cluster<strong>in</strong>g is the task of discover<strong>in</strong>g groups<br />
<strong>and</strong> structures <strong>in</strong> the data that are <strong>in</strong> some way or another "similar",<br />
without us<strong>in</strong>g known structures <strong>in</strong> the data. The clusters will be<br />
created based on the keywords extracted from our <strong>web</strong> <strong>text</strong>. These<br />
clusters will be created us<strong>in</strong>g <strong>fuzzy</strong> C mean algorithm. The <strong>fuzzy</strong> c-<br />
means algorithm is one of the most widely used soft <strong>cluster<strong>in</strong>g</strong><br />
algorithms. It is a variant of st<strong>and</strong>ard k-means algorithm that uses a<br />
soft membership function. Fuzzy C-Means (FCM) <strong>cluster<strong>in</strong>g</strong><br />
algorithm is one of the most popular <strong>fuzzy</strong> <strong>cluster<strong>in</strong>g</strong> algorithms.<br />
FCM is based on m<strong>in</strong>imization of the objective function Fm(u, c):<br />
FCM computes the membership uij <strong>and</strong> the cluster centers cj by:<br />
where m, the fuzzification factor which is a weight<strong>in</strong>g exponent on<br />
each <strong>fuzzy</strong> membership, is any real number greater than 1, uij is the<br />
degree of membership of xi <strong>in</strong> the cluster j, xi is the i th of d-<br />
dimensional measured data, cj is the dimension center of the cluster,<br />
d2(xk,ci) is a distance measure between object xk <strong>and</strong> cluster center<br />
ci, <strong>and</strong> ||*|| is any norm express<strong>in</strong>g the similarity between any<br />
measured data <strong>and</strong> the center.<br />
1. Read <strong>in</strong>put Str<strong>in</strong>g.<br />
2. Read <strong>in</strong>put search path.<br />
Proposed Algorithm<br />
3. Cluster <strong>in</strong>put str<strong>in</strong>g as per C Means Fuzzy Cluster<strong>in</strong>g.<br />
4. Read files from selected path with specified extension.<br />
5. Convert selected file <strong>in</strong>to <strong>text</strong> read format.<br />
6. Search <strong>in</strong>put str<strong>in</strong>g cluster <strong>in</strong>to file & store a result <strong>in</strong>to<br />
output cluster directory.<br />
7. Repeat step 5 & 6 until all files are scanned else<br />
8. Stop.<br />
go to step 8.<br />
Here the proposed algorithm is responsible for extract<strong>in</strong>g keywords<br />
present <strong>in</strong> the full <strong>text</strong> <strong>web</strong> article store these keywords <strong>in</strong> a relation.<br />
Then the actual work of algorithm beg<strong>in</strong>s, it starts <strong>cluster<strong>in</strong>g</strong> of<br />
keywords. The algorithm <strong>in</strong>itially picks some keywords that are<br />
extracted. It groups the full <strong>text</strong> articles based on these keywords. It<br />
means each cluster conta<strong>in</strong>s only those articles which conta<strong>in</strong> that<br />
keyword as their part. Then it starts us<strong>in</strong>g <strong>fuzzy</strong> C mean <strong>cluster<strong>in</strong>g</strong><br />
to comb<strong>in</strong>e the clusters together on some similarity measure. Here<br />
we comb<strong>in</strong>e two clusters if their similarity measure is greater than<br />
or equal to a specified threshold value. The proposed Algorithm<br />
repeats this process until no more changes are made to the clusters.<br />
F<strong>in</strong>ally the proposed algorithm stores all the clusters <strong>in</strong> directory.<br />
Here our motive to extract all the full <strong>text</strong> articles which may be<br />
1531<br />
Rahul R.Papalkar et.al. www.<strong>ijcsmr</strong>.org
International Journal of Computer Science <strong>and</strong> Management Research Vol 2 Issue 2 February 2013<br />
ISSN 2278-733X<br />
relevant for the user provid<strong>in</strong>g the search str<strong>in</strong>g, for this out of all extraction that retrieves the relevant <strong>text</strong> articles more efficiently.<br />
clusters the cluster with largest number of articles is our target.<br />
III. EXPERIMENT RESULT<br />
The experiments were performed on the test <strong>application</strong> developed<br />
<strong>in</strong> ASP.Net 3.0. The database conta<strong>in</strong>s all the article entries<br />
populated manually from the. The search was performed us<strong>in</strong>g the<br />
traditional keyword based search algorithm <strong>and</strong> compared with the<br />
proposed algorithm. The snapshot for asset of search results is<br />
shown <strong>in</strong> table 4.1.Given the same data for <strong>text</strong> extraction, the<br />
proposed algorithm seems to be retriev<strong>in</strong>g approximately 89% more<br />
relevant search results than the keyword based search<strong>in</strong>g.<br />
Input Str<strong>in</strong>g List of Match<strong>in</strong>g Element found<br />
Keyword Based Search<br />
Fuzzy logic 46 85<br />
Neural Network 43 89<br />
Image m<strong>in</strong><strong>in</strong>g 49 94<br />
Signal Process<strong>in</strong>g 36 96<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
Proposed<br />
Method<br />
Table 4.1 Comparative study of keyword based search & proposed method<br />
IV. CONCLUSION<br />
List of Match<strong>in</strong>g<br />
Element found<br />
Keyword Based<br />
Search<br />
List of Match<strong>in</strong>g<br />
Element found<br />
Proposed<br />
Method<br />
Extraction of <strong>text</strong> from <strong>web</strong> is an essential operation. Given that<br />
there have been many <strong>text</strong> extraction methods developed; this paper<br />
presents a novel technique that employs keyword based article<br />
<strong>cluster<strong>in</strong>g</strong> to further enhance the <strong>text</strong> extraction process. The<br />
development of the proposed algorithm is of practical significance;<br />
however it is challeng<strong>in</strong>g to design a unified approach of <strong>text</strong><br />
The proposed algorithm, us<strong>in</strong>g data m<strong>in</strong><strong>in</strong>g algorithm, seems to<br />
extract the <strong>text</strong> with con<strong>text</strong>ual completeness <strong>in</strong> overall, <strong>in</strong>dividual<br />
<strong>and</strong> collective forms, mak<strong>in</strong>g it able to significantly enhance the <strong>text</strong><br />
extraction process from <strong>web</strong> literature.<br />
REFERENCES<br />
[1]Clifton, Christopher (2010). "Encyclopedia Britannica: Def<strong>in</strong>ition of Data<br />
M<strong>in</strong><strong>in</strong>g". Retrieved 2010-12-09.<br />
[2] Han, J., & Kamber, M., Data M<strong>in</strong><strong>in</strong>g Concepts <strong>and</strong> Techniques. CA:Morgan<br />
Kaufmann, 2001.<br />
[3] Badgett RG: How to search for <strong>and</strong> evaluate medical evidence. Sem<strong>in</strong>ars <strong>in</strong><br />
Medical Practice 1999, 2:8-14, 28.<br />
[4]Richardson J: Build<strong>in</strong>g CAM databases: the challenges ahead. J Altern<br />
Complement Med 2002, 8:7-8.<br />
[5] Kantardzic, Mehmed (2003). Data M<strong>in</strong><strong>in</strong>g: Concepts, Models, Methods, <strong>and</strong><br />
Algorithms. John Wiley & Sons. ISBN 0471228524. OCLC 50055336<br />
[6] Miller, H. <strong>and</strong> Han, J., (eds.), 2001, Geographic Data M<strong>in</strong><strong>in</strong>g <strong>and</strong> Knowledge<br />
Discovery, (London: Taylor & Francis).<br />
[7] Manu Aery, Naveen Ramamurthy, <strong>and</strong> Y. Alp Asl<strong>and</strong>ogan. Topic identification<br />
of <strong>text</strong>ual data. Technical report, The University of Texas at Arl<strong>in</strong>gton, 2003.<br />
[8] Pavel Berkh<strong>in</strong>. Survey of <strong>cluster<strong>in</strong>g</strong> data m<strong>in</strong><strong>in</strong>g techniques. Technical report,<br />
Accrue Software, San Jose, CA, 2002.<br />
[9] Cecil Chua, Roger H.L. Chiang, <strong>and</strong> Ee-Peng Lim. An <strong>in</strong>tegrated data m<strong>in</strong><strong>in</strong>g<br />
system to automate discovery of measures of association. In Proceed<strong>in</strong>gs of the 33rd<br />
Hawaii International Conference on System Sciences, 2000.<br />
[10] George Forman. An extensive empirical study of feature selection metrics for<br />
<strong>text</strong> classification. J. Mach. Learn. Res., 3:1289-1305, 2003.<br />
[11] Rayid Ghani. Comb<strong>in</strong><strong>in</strong>g labeled <strong>and</strong> unlabeled data for <strong>text</strong> classification with a<br />
large number of categories. In IEEE Conference on Data M<strong>in</strong><strong>in</strong>g, 2001.<br />
[12] George Karypis <strong>and</strong> Eui-Hong Han. Concept <strong>in</strong>dex<strong>in</strong>g: A fast dimensionality<br />
reduction algorithm with <strong>application</strong>s to document retrieval <strong>and</strong> categorization.<br />
Technical report TR-00-0016, University of M<strong>in</strong>nesota, 2000.<br />
[13] Jerome Moore, Eui-Hong Han, Daniel Boley, Maria G<strong>in</strong>i, Robert Gross, Kyle<br />
Hast<strong>in</strong>gs, George Karypis, Vip<strong>in</strong> Kumar, <strong>and</strong> Bamshad Mobasher. Web page<br />
categorization <strong>and</strong> feature selection us<strong>in</strong>g association rule <strong>and</strong> pr<strong>in</strong>cipal component<br />
<strong>cluster<strong>in</strong>g</strong>. In7th Workshop on Information Technologies <strong>and</strong> Systems, 1997.<br />
[14] Sam Scott <strong>and</strong> Sam Matw<strong>in</strong>. Text classification us<strong>in</strong>g wordnet hypernyms. In<br />
Proceed<strong>in</strong>gs of the COLING/ACL Workshop on Usage of WordNet <strong>in</strong> Natural<br />
Language Process<strong>in</strong>g Systems, Montreal, 1998.<br />
1532<br />
Rahul R.Papalkar et.al. www.<strong>ijcsmr</strong>.org
International Journal of Computer Science <strong>and</strong> Management Research Vol 2 Issue 2 February 2013<br />
ISSN 2278-733X<br />
[15] Michael Ste<strong>in</strong>bach, George Karypis, <strong>and</strong> Vip<strong>in</strong> Kumar. A comparison of<br />
document <strong>cluster<strong>in</strong>g</strong> techniques. In KDD Workshop on Text M<strong>in</strong><strong>in</strong>g, 2000.<br />
[16] Andreas We<strong>in</strong>gessel, Mart<strong>in</strong> Natter, <strong>and</strong> Kurt Hornik. Us<strong>in</strong>g <strong>in</strong>dependent<br />
component analysis for feature extraction <strong>and</strong> multivariate data projection, 1998.<br />
[17] Robert Nisbet (2006) Data M<strong>in</strong><strong>in</strong>g Tools: Which One is Best for CRM? Part 1,<br />
Information Management Special Reports, January 2006.<br />
[18] Dom<strong>in</strong>ique Haughton, Joel Deichmann, Abdolreza Eshghi, Sel<strong>in</strong> Sayek,<br />
Nicholas Teebagy, & Heikki Topi (2003) A Review of Software Packages for Data<br />
M<strong>in</strong><strong>in</strong>g, The American Statistician, Vol. 57, No. 4, pp. 290–309.<br />
[19] R. Agrawal et al., Fast discovery of association rules, <strong>in</strong> Advances <strong>in</strong> knowledge<br />
discovery <strong>and</strong> data m<strong>in</strong><strong>in</strong>g pp. 307–328, MIT Press, 1996.<br />
[20] Kumar, V. (2011). An Empirical Study of the Applications of Data M<strong>in</strong><strong>in</strong>g<br />
Techniques <strong>in</strong> Higher Education. International Journal of Advanced Computer<br />
Science <strong>and</strong> Applications - IJACSA, 2(3), 80-84.<br />
[21]Jadhav, R. J. (2011). Churn Prediction <strong>in</strong> Telecommunication Us<strong>in</strong>g Data M<strong>in</strong><strong>in</strong>g<br />
Technology. International Journal of Advanced Computer Science <strong>and</strong> Applications -<br />
IJACSA, 2(2), 17-19.<br />
[22] Devi, S. N. (2011). A study on Feature Selection Techniques <strong>in</strong> Bio-Informatics.<br />
International Journal of Advanced Computer Science <strong>and</strong> Applications - IJACSA,<br />
2(1), 138-144.<br />
1533<br />
Rahul R.Papalkar et.al. www.<strong>ijcsmr</strong>.org