09.06.2013 Views

fuzzy clustering in web text mining and its application - ijcsmr

fuzzy clustering in web text mining and its application - ijcsmr

fuzzy clustering in web text mining and its application - ijcsmr

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

International Journal of Computer Science <strong>and</strong> Management Research Vol 2 Issue 2 February 2013<br />

ISSN 2278-733X<br />

Abstract<br />

FUZZY CLUSTERING IN WEB TEXT MINING<br />

AND ITS APPLICATION IN IEEE ABSTRACT<br />

CLASSIFICATION<br />

Text M<strong>in</strong><strong>in</strong>g, a branch of computer science [1], is the process of<br />

extract<strong>in</strong>g patterns from large data sets by comb<strong>in</strong><strong>in</strong>g methods from<br />

statistics <strong>and</strong> artificial <strong>in</strong>telligence with database management. Text<br />

M<strong>in</strong><strong>in</strong>g is seen as an <strong>in</strong>creas<strong>in</strong>gly important tool by modern bus<strong>in</strong>ess to<br />

transform data <strong>in</strong>to bus<strong>in</strong>ess <strong>in</strong>telligence giv<strong>in</strong>g an <strong>in</strong>formational<br />

advantage. Web <strong>text</strong> retrieval refers to <strong>text</strong> retrieval techniques applied<br />

to Web resources <strong>and</strong> literature available on the Web. The volume of<br />

published Web research, <strong>and</strong> therefore the underly<strong>in</strong>g Web knowledge<br />

base, is exp<strong>and</strong><strong>in</strong>g at an <strong>in</strong>creas<strong>in</strong>g rate. Web <strong>text</strong> retrieval is a way to<br />

aid researchers <strong>in</strong> cop<strong>in</strong>g with <strong>in</strong>formation overload. By discover<strong>in</strong>g<br />

predictive relationships between different pieces of extracted data,<br />

data-m<strong>in</strong><strong>in</strong>g algorithms can be used to improve the accuracy of<br />

<strong>in</strong>formation extraction. However, <strong>text</strong>ual variation due to typos,<br />

abbreviations, <strong>and</strong> other sources can prevent the productive discovery<br />

<strong>and</strong> utilization of hard-match<strong>in</strong>g rules. Recent methods of soft<br />

<strong>cluster<strong>in</strong>g</strong> can exploit predictive relationships <strong>in</strong> <strong>text</strong>ual data. This<br />

paper presents a technique for us<strong>in</strong>g soft <strong>cluster<strong>in</strong>g</strong> Text M<strong>in</strong><strong>in</strong>g<br />

algorithm to <strong>in</strong>crease the accuracy of Web <strong>text</strong> extraction.<br />

Experimental results demonstrate that this approach improves <strong>text</strong><br />

extraction more effectively that hard keyword match<strong>in</strong>g rules.<br />

Keywords—Fuzzy cluster,TextM<strong>in</strong><strong>in</strong>g,Webm<strong>in</strong><strong>in</strong>g,document Cluster<strong>in</strong>g<br />

I. INTRODUCTION<br />

Search<strong>in</strong>g the most similar documents to a given one is crucial <strong>in</strong><br />

<strong>text</strong> m<strong>in</strong><strong>in</strong>g because it is the basic process of many techniques such<br />

as classification or <strong>in</strong>formation retrieval. Two of the major issues<br />

that <strong>text</strong> m<strong>in</strong><strong>in</strong>g faces are the large amount of documents, millions<br />

<strong>in</strong> modest cases, <strong>and</strong> a very high dimensionality of the featured<br />

space. Text documents are usually represented as vectors, where<br />

each dimension corresponds to a term <strong>and</strong> the value reflects <strong>its</strong><br />

Rahul R.Papalkar #1 , Gajendras<strong>in</strong>gh Ch<strong>and</strong>el *2 ,<br />

#Department of <strong>in</strong>formation Tehnology, SSICT, Sehore,(M.P.)India<br />

* Department of <strong>in</strong>formation Tehnology, SSICT, Sehore,(M.P.)India<br />

importance <strong>in</strong> the document. There are many approaches to f<strong>in</strong>d the<br />

exact vic<strong>in</strong>ity of an object. However, they suffer the curse of<br />

dimensionality, that is, their performance drastically decreases as<br />

the number of dimensions grows. This problem prevents <strong>its</strong><br />

<strong>application</strong> <strong>in</strong> <strong>text</strong> m<strong>in</strong><strong>in</strong>g. To avoid the curse of dimensionality a<br />

variety of methods based on <strong>in</strong>exact search<strong>in</strong>g have been pro-posed.<br />

In [1, 2] a probabilistic technique, with a good performance, was<br />

presented. This solution uses some elements of the tra<strong>in</strong><strong>in</strong>g set as<br />

pivots or permutants. Basically, the permutants are used to predict<br />

proximity between elements <strong>and</strong> to reduce the number of real<br />

distance evaluation at query time. Although this method has a good<br />

performance when search<strong>in</strong>g proximities over documents, it<br />

<strong>in</strong>troduces an overload at search time, due to the necessity to<br />

perform a sequential search over permutants [1], or to use an<br />

auxiliary structure to avoid it [2]. This overload <strong>in</strong>creases when the<br />

space dimension or the size of datasets grows. In this paper we<br />

<strong>in</strong>troduce improvements to our access method for <strong>in</strong>dex<strong>in</strong>g<br />

collections of objects represent<strong>in</strong>g a very high-dimensional space<br />

presented <strong>in</strong> [3].For <strong>in</strong>dex<strong>in</strong>g, this method uses a comb<strong>in</strong>ation of a<br />

graph structure <strong>and</strong> pivots (used as entry po<strong>in</strong>ts), <strong>and</strong> a very fast<br />

search algorithm that uses distance or similarity based measures <strong>in</strong><br />

order to obta<strong>in</strong> the k-nearest neighbors (knn) of novel query objects.<br />

In this paper, we <strong>in</strong>troduce a new fast way to generate the connected<br />

graph <strong>and</strong> a prune rule to improve searches. Although the time<br />

required generat<strong>in</strong>g the <strong>in</strong>dex structure grows with the size of<br />

collection of objects used, this process is carried out only once<br />

(offl<strong>in</strong>e) <strong>and</strong> does not affect the query process.<br />

1529<br />

Rahul R.Papalkar et.al. www.<strong>ijcsmr</strong>.org


International Journal of Computer Science <strong>and</strong> Management Research Vol 2 Issue 2 February 2013<br />

ISSN 2278-733X<br />

II. Literature Survey<br />

process<strong>in</strong>g is to discover important features from raw data. Data<br />

Current Text M<strong>in</strong><strong>in</strong>g tools operate on structured data, the k<strong>in</strong>d of<br />

data that resides <strong>in</strong> large relational databases whereas data <strong>in</strong> the<br />

multimedia databases are semi-structured or unstructured. Often<br />

compared with <strong>text</strong> m<strong>in</strong><strong>in</strong>g, multimedia m<strong>in</strong><strong>in</strong>g reaches much<br />

higher complexity result<strong>in</strong>g from: a) The huge volume of data, b)<br />

The variability <strong>and</strong> heterogeneity of the multimedia data (e.g.<br />

diversity of sensors, time or conditions of acquisition etc) <strong>and</strong> c)<br />

The multimedia content’s mean<strong>in</strong>g is subjective [6].<br />

Unstructured data<br />

Unstructured data is simply a bit stream. Examples <strong>in</strong>clude pixel<br />

level representation for images, video, <strong>and</strong> audio, <strong>and</strong> character<br />

level representation for <strong>text</strong>. Substantial process<strong>in</strong>g <strong>and</strong><br />

<strong>in</strong>terpretation are required to extract semantics from unstructured<br />

data [7]. This k<strong>in</strong>d of data is not broken down <strong>in</strong>to smaller logical<br />

structures <strong>and</strong> is not typically <strong>in</strong>terpreted by the database<br />

Architectures for Multimedia Text M<strong>in</strong><strong>in</strong>g<br />

Various architectures are be<strong>in</strong>g exam<strong>in</strong>ed to design <strong>and</strong> develop a<br />

multimedia Text M<strong>in</strong><strong>in</strong>g system. The first architecture <strong>in</strong>cludes the<br />

follow<strong>in</strong>g. Extract data or metadata from the unstructured database.<br />

Store the extracted data <strong>in</strong> a structured database <strong>and</strong> apply Text<br />

M<strong>in</strong><strong>in</strong>gtools on the structured database [8]. This is illustrated <strong>in</strong><br />

figure 2.1.<br />

Figure 2.1 Convert<strong>in</strong>g unstructured data to structured data for m<strong>in</strong><strong>in</strong>g<br />

Figure 2.1 present architecture of apply<strong>in</strong>g multimedia m<strong>in</strong><strong>in</strong>g <strong>in</strong><br />

different multimedia types [18]. Data collection is the start<strong>in</strong>g po<strong>in</strong>t<br />

of a learn<strong>in</strong>g system, as the quality of raw data determ<strong>in</strong>es the<br />

overall achievable performance. Then, the goal of data pre-<br />

pre-process<strong>in</strong>g <strong>in</strong>cludes data clean<strong>in</strong>g, normalization,<br />

transformation, feature selection, etc. Learn<strong>in</strong>g can be<br />

straightforward, if <strong>in</strong>formative features can be identified at pre-<br />

process<strong>in</strong>g stage. Detailed procedure depends highly on the nature<br />

of raw data <strong>and</strong> problem’s doma<strong>in</strong>. In some cases, prior knowledge<br />

can be extremely valuable.<br />

Figure 2.2 Multimedia M<strong>in</strong><strong>in</strong>g Process<br />

For many systems, this stage is still primarily conducted by doma<strong>in</strong><br />

experts. The product of data pre-process<strong>in</strong>g is the tra<strong>in</strong><strong>in</strong>g set. Given<br />

a tra<strong>in</strong><strong>in</strong>g set, a learn<strong>in</strong>g model has to be chosen to learn from it. It<br />

must be mentioned that the steps of multimedia m<strong>in</strong><strong>in</strong>g are often<br />

iterative. The analyst can also jump back <strong>and</strong> forth between major<br />

tasks <strong>in</strong> order to improve the results [6].<br />

Figure 2.3 present architecture of apply<strong>in</strong>g multimedia m<strong>in</strong><strong>in</strong>g <strong>in</strong> different<br />

multimedia types<br />

Figure 2.3 present architecture of apply<strong>in</strong>g multimedia m<strong>in</strong><strong>in</strong>g <strong>in</strong><br />

different multimedia types [5]. Here the ma<strong>in</strong> stages of the Text<br />

M<strong>in</strong><strong>in</strong>g process are (1) doma<strong>in</strong> underst<strong>and</strong><strong>in</strong>g; (2) data selection; (3)<br />

lean<strong>in</strong>g<br />

1530<br />

<strong>and</strong> preprocess<strong>in</strong>g; (4) discover<strong>in</strong>g patterns ;(5)<br />

Rahul R.Papalkar et.al. www.<strong>ijcsmr</strong>.org


International Journal of Computer Science <strong>and</strong> Management Research Vol 2 Issue 2 February 2013<br />

ISSN 2278-733X<br />

<strong>in</strong>terpretation; <strong>and</strong> (6) report<strong>in</strong>g <strong>and</strong> us<strong>in</strong>g discovered knowledge.<br />

The doma<strong>in</strong> underst<strong>and</strong><strong>in</strong>g stage requires learn<strong>in</strong>g how the results of<br />

data-m<strong>in</strong><strong>in</strong>g will be used so as to gather all relevant prior<br />

knowledge before m<strong>in</strong><strong>in</strong>g.<br />

II. METHODOLOGY<br />

This process is done <strong>in</strong> three steps: <strong>in</strong>formation retrieval,<br />

<strong>in</strong>formation extraction <strong>and</strong> <strong>text</strong> m<strong>in</strong><strong>in</strong>g. A primary reason for<br />

us<strong>in</strong>g Text M<strong>in</strong><strong>in</strong>g for <strong>web</strong> <strong>text</strong> is to assist <strong>in</strong> the analysis of<br />

collections of the available <strong>web</strong> <strong>text</strong>. Web data is vulnerable to co<br />

l<strong>in</strong>earity because of unknown <strong>in</strong>terrelations. The analysis <strong>in</strong> this<br />

paper will be augmented by us<strong>in</strong>g experiment-based approach.<br />

Before Text M<strong>in</strong><strong>in</strong>g algorithms can be used, a target data set will be<br />

assembled. As Text M<strong>in</strong><strong>in</strong>g can only uncover patterns already<br />

present <strong>in</strong> the data, the target dataset must be large enough to<br />

conta<strong>in</strong> these patterns. Pre-process is essential to analyze the<br />

multivariate datasets before <strong>cluster<strong>in</strong>g</strong> or <strong>text</strong> m<strong>in</strong><strong>in</strong>g. The target set<br />

is then cleaned. Clean<strong>in</strong>g removes the observations with noise <strong>and</strong><br />

miss<strong>in</strong>g data. The <strong>web</strong> data available with us is first put <strong>in</strong>to a data<br />

warehouse. Before putt<strong>in</strong>g the data <strong>in</strong> the data warehouse the<br />

keyword extraction algorithm is used to f<strong>in</strong>d out the keywords from<br />

the full <strong>text</strong>. This keyword extraction uses partial parser to extract<br />

entity names. This parser uses l<strong>in</strong>guistic rules <strong>and</strong> statistical<br />

disambiguity to achieve greater precision. The data is then<br />

organized <strong>in</strong>to clusters. Cluster<strong>in</strong>g is the task of discover<strong>in</strong>g groups<br />

<strong>and</strong> structures <strong>in</strong> the data that are <strong>in</strong> some way or another "similar",<br />

without us<strong>in</strong>g known structures <strong>in</strong> the data. The clusters will be<br />

created based on the keywords extracted from our <strong>web</strong> <strong>text</strong>. These<br />

clusters will be created us<strong>in</strong>g <strong>fuzzy</strong> C mean algorithm. The <strong>fuzzy</strong> c-<br />

means algorithm is one of the most widely used soft <strong>cluster<strong>in</strong>g</strong><br />

algorithms. It is a variant of st<strong>and</strong>ard k-means algorithm that uses a<br />

soft membership function. Fuzzy C-Means (FCM) <strong>cluster<strong>in</strong>g</strong><br />

algorithm is one of the most popular <strong>fuzzy</strong> <strong>cluster<strong>in</strong>g</strong> algorithms.<br />

FCM is based on m<strong>in</strong>imization of the objective function Fm(u, c):<br />

FCM computes the membership uij <strong>and</strong> the cluster centers cj by:<br />

where m, the fuzzification factor which is a weight<strong>in</strong>g exponent on<br />

each <strong>fuzzy</strong> membership, is any real number greater than 1, uij is the<br />

degree of membership of xi <strong>in</strong> the cluster j, xi is the i th of d-<br />

dimensional measured data, cj is the dimension center of the cluster,<br />

d2(xk,ci) is a distance measure between object xk <strong>and</strong> cluster center<br />

ci, <strong>and</strong> ||*|| is any norm express<strong>in</strong>g the similarity between any<br />

measured data <strong>and</strong> the center.<br />

1. Read <strong>in</strong>put Str<strong>in</strong>g.<br />

2. Read <strong>in</strong>put search path.<br />

Proposed Algorithm<br />

3. Cluster <strong>in</strong>put str<strong>in</strong>g as per C Means Fuzzy Cluster<strong>in</strong>g.<br />

4. Read files from selected path with specified extension.<br />

5. Convert selected file <strong>in</strong>to <strong>text</strong> read format.<br />

6. Search <strong>in</strong>put str<strong>in</strong>g cluster <strong>in</strong>to file & store a result <strong>in</strong>to<br />

output cluster directory.<br />

7. Repeat step 5 & 6 until all files are scanned else<br />

8. Stop.<br />

go to step 8.<br />

Here the proposed algorithm is responsible for extract<strong>in</strong>g keywords<br />

present <strong>in</strong> the full <strong>text</strong> <strong>web</strong> article store these keywords <strong>in</strong> a relation.<br />

Then the actual work of algorithm beg<strong>in</strong>s, it starts <strong>cluster<strong>in</strong>g</strong> of<br />

keywords. The algorithm <strong>in</strong>itially picks some keywords that are<br />

extracted. It groups the full <strong>text</strong> articles based on these keywords. It<br />

means each cluster conta<strong>in</strong>s only those articles which conta<strong>in</strong> that<br />

keyword as their part. Then it starts us<strong>in</strong>g <strong>fuzzy</strong> C mean <strong>cluster<strong>in</strong>g</strong><br />

to comb<strong>in</strong>e the clusters together on some similarity measure. Here<br />

we comb<strong>in</strong>e two clusters if their similarity measure is greater than<br />

or equal to a specified threshold value. The proposed Algorithm<br />

repeats this process until no more changes are made to the clusters.<br />

F<strong>in</strong>ally the proposed algorithm stores all the clusters <strong>in</strong> directory.<br />

Here our motive to extract all the full <strong>text</strong> articles which may be<br />

1531<br />

Rahul R.Papalkar et.al. www.<strong>ijcsmr</strong>.org


International Journal of Computer Science <strong>and</strong> Management Research Vol 2 Issue 2 February 2013<br />

ISSN 2278-733X<br />

relevant for the user provid<strong>in</strong>g the search str<strong>in</strong>g, for this out of all extraction that retrieves the relevant <strong>text</strong> articles more efficiently.<br />

clusters the cluster with largest number of articles is our target.<br />

III. EXPERIMENT RESULT<br />

The experiments were performed on the test <strong>application</strong> developed<br />

<strong>in</strong> ASP.Net 3.0. The database conta<strong>in</strong>s all the article entries<br />

populated manually from the. The search was performed us<strong>in</strong>g the<br />

traditional keyword based search algorithm <strong>and</strong> compared with the<br />

proposed algorithm. The snapshot for asset of search results is<br />

shown <strong>in</strong> table 4.1.Given the same data for <strong>text</strong> extraction, the<br />

proposed algorithm seems to be retriev<strong>in</strong>g approximately 89% more<br />

relevant search results than the keyword based search<strong>in</strong>g.<br />

Input Str<strong>in</strong>g List of Match<strong>in</strong>g Element found<br />

Keyword Based Search<br />

Fuzzy logic 46 85<br />

Neural Network 43 89<br />

Image m<strong>in</strong><strong>in</strong>g 49 94<br />

Signal Process<strong>in</strong>g 36 96<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

Proposed<br />

Method<br />

Table 4.1 Comparative study of keyword based search & proposed method<br />

IV. CONCLUSION<br />

List of Match<strong>in</strong>g<br />

Element found<br />

Keyword Based<br />

Search<br />

List of Match<strong>in</strong>g<br />

Element found<br />

Proposed<br />

Method<br />

Extraction of <strong>text</strong> from <strong>web</strong> is an essential operation. Given that<br />

there have been many <strong>text</strong> extraction methods developed; this paper<br />

presents a novel technique that employs keyword based article<br />

<strong>cluster<strong>in</strong>g</strong> to further enhance the <strong>text</strong> extraction process. The<br />

development of the proposed algorithm is of practical significance;<br />

however it is challeng<strong>in</strong>g to design a unified approach of <strong>text</strong><br />

The proposed algorithm, us<strong>in</strong>g data m<strong>in</strong><strong>in</strong>g algorithm, seems to<br />

extract the <strong>text</strong> with con<strong>text</strong>ual completeness <strong>in</strong> overall, <strong>in</strong>dividual<br />

<strong>and</strong> collective forms, mak<strong>in</strong>g it able to significantly enhance the <strong>text</strong><br />

extraction process from <strong>web</strong> literature.<br />

REFERENCES<br />

[1]Clifton, Christopher (2010). "Encyclopedia Britannica: Def<strong>in</strong>ition of Data<br />

M<strong>in</strong><strong>in</strong>g". Retrieved 2010-12-09.<br />

[2] Han, J., & Kamber, M., Data M<strong>in</strong><strong>in</strong>g Concepts <strong>and</strong> Techniques. CA:Morgan<br />

Kaufmann, 2001.<br />

[3] Badgett RG: How to search for <strong>and</strong> evaluate medical evidence. Sem<strong>in</strong>ars <strong>in</strong><br />

Medical Practice 1999, 2:8-14, 28.<br />

[4]Richardson J: Build<strong>in</strong>g CAM databases: the challenges ahead. J Altern<br />

Complement Med 2002, 8:7-8.<br />

[5] Kantardzic, Mehmed (2003). Data M<strong>in</strong><strong>in</strong>g: Concepts, Models, Methods, <strong>and</strong><br />

Algorithms. John Wiley & Sons. ISBN 0471228524. OCLC 50055336<br />

[6] Miller, H. <strong>and</strong> Han, J., (eds.), 2001, Geographic Data M<strong>in</strong><strong>in</strong>g <strong>and</strong> Knowledge<br />

Discovery, (London: Taylor & Francis).<br />

[7] Manu Aery, Naveen Ramamurthy, <strong>and</strong> Y. Alp Asl<strong>and</strong>ogan. Topic identification<br />

of <strong>text</strong>ual data. Technical report, The University of Texas at Arl<strong>in</strong>gton, 2003.<br />

[8] Pavel Berkh<strong>in</strong>. Survey of <strong>cluster<strong>in</strong>g</strong> data m<strong>in</strong><strong>in</strong>g techniques. Technical report,<br />

Accrue Software, San Jose, CA, 2002.<br />

[9] Cecil Chua, Roger H.L. Chiang, <strong>and</strong> Ee-Peng Lim. An <strong>in</strong>tegrated data m<strong>in</strong><strong>in</strong>g<br />

system to automate discovery of measures of association. In Proceed<strong>in</strong>gs of the 33rd<br />

Hawaii International Conference on System Sciences, 2000.<br />

[10] George Forman. An extensive empirical study of feature selection metrics for<br />

<strong>text</strong> classification. J. Mach. Learn. Res., 3:1289-1305, 2003.<br />

[11] Rayid Ghani. Comb<strong>in</strong><strong>in</strong>g labeled <strong>and</strong> unlabeled data for <strong>text</strong> classification with a<br />

large number of categories. In IEEE Conference on Data M<strong>in</strong><strong>in</strong>g, 2001.<br />

[12] George Karypis <strong>and</strong> Eui-Hong Han. Concept <strong>in</strong>dex<strong>in</strong>g: A fast dimensionality<br />

reduction algorithm with <strong>application</strong>s to document retrieval <strong>and</strong> categorization.<br />

Technical report TR-00-0016, University of M<strong>in</strong>nesota, 2000.<br />

[13] Jerome Moore, Eui-Hong Han, Daniel Boley, Maria G<strong>in</strong>i, Robert Gross, Kyle<br />

Hast<strong>in</strong>gs, George Karypis, Vip<strong>in</strong> Kumar, <strong>and</strong> Bamshad Mobasher. Web page<br />

categorization <strong>and</strong> feature selection us<strong>in</strong>g association rule <strong>and</strong> pr<strong>in</strong>cipal component<br />

<strong>cluster<strong>in</strong>g</strong>. In7th Workshop on Information Technologies <strong>and</strong> Systems, 1997.<br />

[14] Sam Scott <strong>and</strong> Sam Matw<strong>in</strong>. Text classification us<strong>in</strong>g wordnet hypernyms. In<br />

Proceed<strong>in</strong>gs of the COLING/ACL Workshop on Usage of WordNet <strong>in</strong> Natural<br />

Language Process<strong>in</strong>g Systems, Montreal, 1998.<br />

1532<br />

Rahul R.Papalkar et.al. www.<strong>ijcsmr</strong>.org


International Journal of Computer Science <strong>and</strong> Management Research Vol 2 Issue 2 February 2013<br />

ISSN 2278-733X<br />

[15] Michael Ste<strong>in</strong>bach, George Karypis, <strong>and</strong> Vip<strong>in</strong> Kumar. A comparison of<br />

document <strong>cluster<strong>in</strong>g</strong> techniques. In KDD Workshop on Text M<strong>in</strong><strong>in</strong>g, 2000.<br />

[16] Andreas We<strong>in</strong>gessel, Mart<strong>in</strong> Natter, <strong>and</strong> Kurt Hornik. Us<strong>in</strong>g <strong>in</strong>dependent<br />

component analysis for feature extraction <strong>and</strong> multivariate data projection, 1998.<br />

[17] Robert Nisbet (2006) Data M<strong>in</strong><strong>in</strong>g Tools: Which One is Best for CRM? Part 1,<br />

Information Management Special Reports, January 2006.<br />

[18] Dom<strong>in</strong>ique Haughton, Joel Deichmann, Abdolreza Eshghi, Sel<strong>in</strong> Sayek,<br />

Nicholas Teebagy, & Heikki Topi (2003) A Review of Software Packages for Data<br />

M<strong>in</strong><strong>in</strong>g, The American Statistician, Vol. 57, No. 4, pp. 290–309.<br />

[19] R. Agrawal et al., Fast discovery of association rules, <strong>in</strong> Advances <strong>in</strong> knowledge<br />

discovery <strong>and</strong> data m<strong>in</strong><strong>in</strong>g pp. 307–328, MIT Press, 1996.<br />

[20] Kumar, V. (2011). An Empirical Study of the Applications of Data M<strong>in</strong><strong>in</strong>g<br />

Techniques <strong>in</strong> Higher Education. International Journal of Advanced Computer<br />

Science <strong>and</strong> Applications - IJACSA, 2(3), 80-84.<br />

[21]Jadhav, R. J. (2011). Churn Prediction <strong>in</strong> Telecommunication Us<strong>in</strong>g Data M<strong>in</strong><strong>in</strong>g<br />

Technology. International Journal of Advanced Computer Science <strong>and</strong> Applications -<br />

IJACSA, 2(2), 17-19.<br />

[22] Devi, S. N. (2011). A study on Feature Selection Techniques <strong>in</strong> Bio-Informatics.<br />

International Journal of Advanced Computer Science <strong>and</strong> Applications - IJACSA,<br />

2(1), 138-144.<br />

1533<br />

Rahul R.Papalkar et.al. www.<strong>ijcsmr</strong>.org

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!