16.04.2015 Views

An Improved XCSR Classifier System for Data Mining ... - Gjset.org

An Improved XCSR Classifier System for Data Mining ... - Gjset.org

An Improved XCSR Classifier System for Data Mining ... - Gjset.org

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)<br />

Issue 2, 2012 , pp. 52-57<br />

© GJSET Publishing, 2012.<br />

http://www.gjset.<strong>org</strong><br />

<strong>An</strong> <strong>Improved</strong> <strong>XCSR</strong> <strong>Classifier</strong> <strong>System</strong> <strong>for</strong> <strong>Data</strong> <strong>Mining</strong> with<br />

Limited Training Samples<br />

1 MASOUD SHARIAT PANAHI, 2 NAVID MOSHTAGHI YAZDANI<br />

1 Associate Professor at Faculty of Mechanical Engineering,²Student of Mechatronics at Master of<br />

Science Degree<br />

1 University of Tehran, 2 University of Tehran Kish International Campus<br />

1 Tehran, 2 Kish Island<br />

1 IRAN, 2 IRAN<br />

1 mshariatp@ut.ac.ir, 2 navid.moshtaghi@ut.ac.ir<br />

Abstract: - The extended classifier system with continuous variables of <strong>XCSR</strong> which is recognized as<br />

one of the most successful Learning Agents <strong>for</strong> solving data mining problems in semi-observable<br />

environments, updates its rules according to the accuracy in the reward prediction received from<br />

evaluation environment and by using genetics algorithm. In this system each law consists of three<br />

characteristics of prediction, prediction error and fitness, the accuracy of which is determined in<br />

proportion to how much the prediction is close to its reward received from the environment. On the other<br />

hand, according to the common approach <strong>for</strong> training <strong>XCSR</strong> in data mining applications, i.e. using<br />

supervised learning, only the fitness of a rule is increased that responds a positive answer to training data.<br />

It means that the chance of each rule <strong>for</strong> survival and partnership in the process of new rules production<br />

directly depends on the way of responding training data and to determine this chance realistically requires<br />

a large number of training data. Since the number of training data is limited in real applications and to<br />

acquire the more data necessitate lots of time and expenses (<strong>for</strong> accomplishing empirical tests and similar<br />

to that) using <strong>XCSR</strong> in such applications are not often explainable. In this study a new method is<br />

presented <strong>for</strong> improving per<strong>for</strong>mance and increasing convergence rank of <strong>XCSR</strong> by means of limited<br />

training data. The new method is based on multiple use of the existing examples, this means that the<br />

mentioned data will be changed to new and meaningful data aided by crossover operator after presenting<br />

the existing training data to system and updating the attributes of its rules which will improve the power<br />

of generalizing the constituent rules of <strong>XCSR</strong> by presenting new in<strong>for</strong>mation about the existing pattern<br />

and environment in it. The manner of per<strong>for</strong>ming the suggested method is indicated via implementing it<br />

on some benchmark problems and the efficiency of the said method is also evaluated through comparing<br />

the obtained results with the results existing in other articles (about benchmark problems).<br />

Key-Words: - <strong>Classifier</strong> <strong>System</strong>, <strong>XCSR</strong>, <strong>Data</strong> <strong>Mining</strong>, Genetics Algorithm, Learning Agents<br />

52


Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)<br />

Issue 2, 2012 , pp. 52-57<br />

© GJSET Publishing, 2012.<br />

http://www.gjset.<strong>org</strong><br />

1 Preface<br />

Machine learning is said to a wide range of<br />

supervised and unsupervised learning algorithms<br />

which are aiming at preventing from exhaustive<br />

search of data in the field of data mining and<br />

replacing this type of time consuming search is<br />

accomplished by intelligent methods which make it<br />

possible to simply classify or model their behavior<br />

via finding existing patterns among data. In the last<br />

two decades there are so many methods presented<br />

in the field of data mining in which different types<br />

of supervised, unsupervised or rein<strong>for</strong>cement<br />

learning algorithm are used <strong>for</strong> such goals as<br />

recognition and allocation of pattern. From among<br />

the most successful of these methods we can point<br />

out the classifier systems.<br />

In the general state, the classifier systems include a<br />

set of rules with “if-then” <strong>for</strong>mat, each of which<br />

presents a potential solution <strong>for</strong> the goal problem.<br />

This set of law is gradually evaluated by using a<br />

rein<strong>for</strong>ced learning mechanism and updated in a<br />

specified intervals aided by a genetic algorithm. In<br />

the process of this gradual evolution, the system<br />

learns environmental behavior and then in<br />

application phase, presents suitable answers to the<br />

queries propounded by the user.<br />

The first classifier system was suggested by<br />

Holland under the title of Learning <strong>Classifier</strong><br />

<strong>System</strong> (LCS) in 1976. In this system, the value of<br />

each law was evaluated with an index called<br />

strength. The strength of a law in proportion with<br />

the amount of correct response to learning<br />

examples was increased within the framework of<br />

rein<strong>for</strong>ced learning criteria and in specified<br />

intervals, an algorithm of evolution search<br />

(generally genetics algorithm) was responsible <strong>for</strong><br />

producing new rules and omitting inefficient rules.<br />

At the end of training phase, this set of rules had<br />

the relative ability to present acceptable solutions<br />

when encountering new queries. Yet the successful<br />

per<strong>for</strong>mance of LCs was subject to selection of<br />

suitable amounts <strong>for</strong> controlling parameters of the<br />

system that mainly depends on the experience of<br />

designer of this system.<br />

When LCS was generated, the other types of<br />

classifier systems were recommended among<br />

which we can mention Extended <strong>Classifier</strong><br />

<strong>System</strong>s: XCS. Be<strong>for</strong>e introducing XCS in 1995,<br />

the ability of these systems was very limited in<br />

achieving suitable answers. But from that time on<br />

these systems were gradually changed to more<br />

intelligent and accurate agents and it is now<br />

believed that XCS and its improved versions are<br />

able to solve complicated problems with no need to<br />

adjust the parameters. Having introduced the<br />

classifier system with continuous variables<br />

(<strong>XCSR</strong>), some inherent weaknesses of binary<br />

classifier systems such as inability in introducing<br />

specified intervals of variable amounts were largely<br />

resolved and nowadays, these systems are<br />

recognized as one of the most successful Learning<br />

Agents <strong>for</strong> solving data mining problems in semiobservable<br />

environments.<br />

According to the common approach <strong>for</strong> training<br />

<strong>XCSR</strong>, only the fitness of a rule is increased that<br />

responds a positive answer to training data. It<br />

means that the chance of each rule <strong>for</strong> not being<br />

omitted and participating in process of new rules<br />

production directly depends on the way of<br />

responding training data and to determine this<br />

chance realistically requires a large number of<br />

training data. Since the number of training data is<br />

limited in real applications and increasing the<br />

number of data are simply possible, using <strong>XCSR</strong> in<br />

such applications are not usually explainable in<br />

terms of time and computational expenses.<br />

In the continue of this study a new method is<br />

presented <strong>for</strong> improving per<strong>for</strong>mance and<br />

increasing convergence rank of <strong>XCSR</strong> by means of<br />

limited training data.<br />

2 Introducing Suggested Method<br />

In the suggested method, firstly the limited set of<br />

training data is commonly applied <strong>for</strong> amending<br />

53


Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)<br />

Issue 2, 2012 , pp. 52-57<br />

© GJSET Publishing, 2012.<br />

http://www.gjset.<strong>org</strong><br />

characteristics of rules consists of prediction,<br />

prediction error and fitness. This is done by means<br />

of the following relation:<br />

Updating prediction and prediction error<br />

If expi < 1/β then Pi =Pi + (R-Pi) / expi,<br />

εi= εi+(|R-Pi|- εi) / expi (1)<br />

If expi ≥ 1/β then Pi =Pi + β (R-Pi),<br />

εi= εi+ β (|R-Pi|- εi) (2)<br />

Updating fitness:<br />

If εi < ε0 then ki=1 (3)<br />

If εi ≥ ε0 then ki= β (εi/ε0) –γ (4)<br />

Fi = fi+ β [(ki/∑kj) – fi] (5)<br />

In these relations, β is learning rank, γ is power of<br />

law accuracy, ε is prediction error, exp is law<br />

experiment, P is law prediction, R is reward<br />

received from environment, k is law accuracy and f<br />

is fitness. i index also indicates number of law in<br />

set of rules.<br />

In the next phase <strong>for</strong> developing variety in set of<br />

data, several couples were selected as parents from<br />

among the fields that display the part of existing<br />

data condition using the method of “Stochastic<br />

selection with remainder”², and new data condition<br />

section is created using intermediate crossover<br />

method which are applied on the fields of parents.<br />

In this method, the quantity of each of the<br />

conditional variables is obtained from the<br />

following relation:<br />

a i = α(a i F )+(1-α)( a i M ) (6)<br />

in which a i is the quantity of conditional variable of<br />

i in new data, a F i is the quantity of conditional<br />

variable i in the first parent (father), a M<br />

i is the<br />

quantity of conditional variable of i in the second<br />

parent (mother) and α is the coefficient of parents<br />

partnership which are determined in adaptive <strong>for</strong>m.<br />

New data section per<strong>for</strong>mance is also produced<br />

using a non-linear mapping of conditional variables<br />

area to area of per<strong>for</strong>mance which are created by<br />

using the existing data.<br />

Diversifying the existing data continues up to<br />

where learning stop condition (<strong>for</strong> example, when<br />

percent of system correct answers to the test data<br />

reach to a pre-determined threshold) is satisfied<br />

aided by completed data.<br />

In the next chapter, some of the common<br />

algorithms are summarily defined <strong>for</strong> supervised<br />

learning and the results obtained from suggested<br />

method are studied with the results of these<br />

methods <strong>for</strong> several criterion examples which will<br />

be explained in chapter 4.<br />

3 Some Common Methods <strong>for</strong><br />

Supervised Learning reface<br />

In this chapter two methods are propounded <strong>for</strong><br />

supervised learning (learning law from training<br />

data): “Isolation and Solving” method and “Tree<br />

and Decision” method. In isolation and solving<br />

method, one law is learnt in each phase and then<br />

covered data are isolated by that law and this<br />

procedure is repeated <strong>for</strong> the remained samples. K-<br />

NN algorithm or the nearest neighbor is one of the<br />

learning algorithms based on samples. This<br />

algorithm in learning phase only restores the<br />

educational samples. For specifying class of one<br />

sample of data, the said algorithm calculates the<br />

space of this sample with other educational<br />

samples. The most common criterion <strong>for</strong><br />

calculating such space is Euclidean norm.<br />

Although such criteria like Manhattan Minofski are<br />

also used <strong>for</strong> this purpose.<br />

After calculating the space, a majority voting is<br />

held between the nearest educational sample (k) to<br />

the current test sample and the majority label of<br />

this sample is allocated to the test sample. K is a<br />

parameter which is determined by the user. Such<br />

algorithms are called Lazy Algorithms because<br />

54


Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)<br />

Issue 2, 2012 , pp. 52-57<br />

© GJSET Publishing, 2012.<br />

http://www.gjset.<strong>org</strong><br />

they do no special works in learning phase and<br />

merely restore test samples.<br />

C4.5 algorithm is one of the most popular<br />

algorithms of making decision tree [3]. This<br />

algorithm was presented in 1993 and is an extended<br />

version of ID3 algorithm. Each intermediate node<br />

in the tree indicates a test on scales of an attribute<br />

and each branch also shows one of the authorized<br />

scales of attributes. The criteria used <strong>for</strong> selecting<br />

suitable attribute <strong>for</strong> a node is In<strong>for</strong>mation Gain<br />

which leads to a bios in favor of attributes with<br />

several values. To eliminate this problem Gairn<br />

Ratio criterion is also applied. ID3 algorithm only<br />

supports discrete attributes, while C4.5 algorithm<br />

in addition to attributes with discrete values also<br />

manages [4] continuous attributes. Moreover,<br />

managing attributes with unspecified values is also<br />

one of the other advantages of C4.5 over ID3. To<br />

avoid the over fitness phenomenon in the generated<br />

classification model, pruning techniques of tree are<br />

used. The over-fitness phenomenon is happened<br />

when the generated classification accuracy of<br />

model is very high on the educational data, but not<br />

reaches a very high accuracy on the set of test data.<br />

In other words, over classification model is<br />

generated in proportion with educational data and<br />

this high proportion does not necessarily lead to<br />

higher classification accuracy on the set of test<br />

data. There are two main techniques <strong>for</strong> pruning a<br />

tree. In the first technique which is called Pre-<br />

Pruning, growth of tree in some paths is stopped<br />

be<strong>for</strong>e completion of tree, while in other technique<br />

under title of Post-Pruning, first the tree is grown<br />

completely and then, some of the sub-trees are<br />

replaced with a leaf node. The generated tree can<br />

be changed to a set of equal classification rule and<br />

then accomplishes pruning of rules by omitting<br />

some of its preconditions<br />

4 Comparing Results of Suggested<br />

Algorithm with other Supervised<br />

Learning Methods<br />

4.1 Heart Diseases Diagnosis Problem<br />

Heart diseases are of such complications the<br />

diagnosis of which is hardly possible. So, some<br />

systems are needed to help the correct diagnosis of<br />

diseases in this regard. Several systems were<br />

recommended in this regard the basis of most of<br />

which is to use techniques that can discover and<br />

generalize a relation between data correctly.<br />

To apply the improved XCS classifier method data<br />

is needed firstly. Hospital systems are of those<br />

systems that deal with a large amount of data and<br />

can provide us in<strong>for</strong>mation. Machine learning<br />

storage UCI provide such facilities that data base in<br />

each research field are accessible and applicable <strong>for</strong><br />

intelligent systems[5].<br />

<strong>Data</strong> base of heart diseases diagnosis in UCI is<br />

related to Cleveland area and include in<strong>for</strong>mation<br />

about disease, patients and their vital signs [1]. In<br />

the rest of this study we will describe the<br />

characteristics related to patients in this data base.<br />

The data base of heart diseases diagnosis existing<br />

in the center restores in<strong>for</strong>mation related to 303<br />

patients each of them has 76 characteristics. Of<br />

course all 76 characteristics are not applied and<br />

only 14 of them are useful which we will explain<br />

them.<br />

1. Age<br />

2. Gender: 1= male; and 0= female<br />

3. Type of chest pain4: 1= Typical <strong>An</strong>gina. 2=<br />

Atypical <strong>An</strong>gina. 3= Non-angina Pain. 4= Without<br />

sign<br />

4. Resting blood pressure5: at the time of entering<br />

to hospital based on MmHg.<br />

5. Serum Cholesterol (mg/dl) 6<br />

6. Fasting Blood Sugar (FBS) 7: 1= higher than<br />

120 mg/dl, 0= lower than 120 mg/dl.<br />

7. Resting Electrocardiographic Results 8:<br />

0=normal. 1=unusual ST-T wave (T wave should<br />

be reverse or ST should be higher than 0.05). 2=<br />

shrinkage of left ventricle.<br />

8. Maximum Heart Rate (MHR)<br />

9. Exercise Induced <strong>An</strong>gina : 1= Yeas. 0= No.<br />

55


Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)<br />

Issue 2, 2012 , pp. 52-57<br />

© GJSET Publishing, 2012.<br />

http://www.gjset.<strong>org</strong><br />

10. ST depression induced by exercise relative to<br />

rest<br />

11. the slope of the peak exercise ST segment<br />

(1= higher incline. 2=smooth. 3= low incline)<br />

12. number of major vessels colored by<br />

fluoroscopy ( 0-3)<br />

13.Thal ( 3= normal. 6= fixed defect. 7= reversible<br />

defect.)<br />

14. Heart Diseases Diagnosis: (<strong>An</strong>giography of<br />

Patient’s Status) 0= Narrowing diameter more than<br />

50%. 1= narrowing diameter less than 50%. The<br />

characteristic No. 14 is considered as diagnosis<br />

characteristic [1,2]. UCI data base contains<br />

in<strong>for</strong>mation of 303 patients, that we select 253<br />

observations from this data base as training<br />

examples and present to the system to training and<br />

apply 50 other observations <strong>for</strong> testing trained<br />

rules.<br />

After preparing data, firstly we apply limited<br />

training examples to the generated rules, doing this<br />

updates rules parameters. Genetics mechanism also<br />

helps us to generate new rules. At the end of<br />

training phase a spectrum of heart diseases which<br />

are presented in <strong>for</strong>m of test data can be predicted<br />

by using these trained rules.<br />

In computer code which was codified <strong>for</strong><br />

per<strong>for</strong>ming suggested method, the resulted<br />

obtained from test examples are being stored in a<br />

vector called answer that this vector is comparable<br />

with vector of system answers to training data<br />

(learning-answer) and has the same sides. The<br />

results of this comparison are shown in the<br />

following Table 1:<br />

Table 1: Results of Algorithm Test suggested <strong>for</strong><br />

Heart Patients characteristics existing in UCI <strong>Data</strong><br />

Base<br />

Number of cases<br />

which have nondiagnosed<br />

diseases<br />

Number of<br />

cases which<br />

have nondiagnosed<br />

diseases<br />

<strong>An</strong>swer 1 32 12<br />

<strong>An</strong>swer 2 4 2<br />

As it is seen in Table 1, 32 patients have heart<br />

diseases that the suggested system diagnosed the<br />

same thing. Also, 12 patients were diagnosed<br />

correctly by this system that they do not have heart<br />

diseases, but 2 patients were wrongly diagnosed<br />

that they do not have heart disease and 4 patients<br />

were wrongly diagnosed that they have heart<br />

disease, and totally <strong>for</strong> evaluating improved<br />

classifier systems (<strong>XCSR</strong>) with calculation we<br />

reach to 12% error which is an acceptable error <strong>for</strong><br />

this systems, and in comparison with <strong>XCSR</strong> we<br />

reach to 9% improvement.<br />

4.2 Diabetes Disease Diagnosis Problem<br />

Hospital systems are of those systems that deal<br />

with a large amount of data and can provide us<br />

in<strong>for</strong>mation. Diabetes disease diagnosis data base<br />

belongs to Diabetes & Kidney and Digestive<br />

Problems Institute which consists of in<strong>for</strong>mation<br />

about disease, patients and their vital signs. In the<br />

rest of this study we will describe the<br />

characteristics related to patients in this data base.<br />

The data base of diabetes disease diagnosis is in<br />

Diabetes & Kidney and Digestive Problems<br />

Institute. We will explain in<strong>for</strong>mation related to<br />

768 patients each of them has 9 characteristics.<br />

56


Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)<br />

Issue 2, 2012 , pp. 52-57<br />

© GJSET Publishing, 2012.<br />

http://www.gjset.<strong>org</strong><br />

Table 2: Diabetes <strong>Data</strong> Set<br />

Number of pregnancies<br />

Plasma glucose concentration<br />

Diastolic blood pressure (MMHG)<br />

Triceps skin fold thickness<br />

2 hours serum insulin<br />

Body mass index<br />

Age<br />

Class variable<br />

After applying suggested method, the results<br />

obtained from per<strong>for</strong>ming improved <strong>XCSR</strong><br />

algorithm was compared with four other algorithms<br />

in the following table in test phase.<br />

[3] Quinlan, J.R. “C4.5: Programs <strong>for</strong> Machine<br />

Learning”, M<strong>org</strong>an-Kaufmann, San Mateo,<br />

CA. 1993.<br />

.<br />

[4] Sharpe, P.K., Glover, R.P. “Efficient GA<br />

based technique <strong>for</strong> classification Applied<br />

Intelligence 11, 277–284, 1999.uinlan, J.R.<br />

“C4.5: Programs <strong>for</strong> Machine Learning”,<br />

M<strong>org</strong>an-Kaufmann, San Mateo, CA. 1993.<br />

[5] C. Blake. , E. Keogh. , C. J. Merz ;” UCI<br />

repository of machine learning databases of<br />

Cali<strong>for</strong>nia” ”<br />

Table 3: Results obtained from per<strong>for</strong>ming<br />

suggested algorithm with classification previous<br />

algorithms<br />

K<br />

star<br />

C4.5 SVM AD<br />

Tree<br />

XCS SUGGESTED<br />

METHOD<br />

70.2 71.3 77.8 73.1 87.19 89.67<br />

5 Conclusion<br />

In this study a new method is presented <strong>for</strong><br />

improving per<strong>for</strong>mance and increasing<br />

convergence rank of <strong>XCSR</strong> by means of limited<br />

training data. To show the efficiency of suggested<br />

system in data mining problems, two criterion<br />

problems “Diabetes Diagnosis” and “Diagnosis of<br />

Type of Heart Disease” were studied aided by this<br />

system. The comparison of results obtained from<br />

suggested method with some other machine<br />

learning algorithms shows the advantage of this<br />

method in proportion to the said methods.<br />

References:<br />

[1] A. Saeidi, Vahid; Yousefian, Alireza;<br />

Shahrabi, Jamal; “Heart Diseases<br />

Diagnosis by means of <strong>Data</strong> <strong>Mining</strong><br />

Techniques”, The 5th Iran <strong>Data</strong> <strong>Mining</strong><br />

Conference 2011..<br />

[2] http://archive.ics.uci.edu/ml/datasets/Heart<br />

+Disease.<br />

57

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!