An Improved XCSR Classifier System for Data Mining ... - Gjset.org

Global Journal of Science, Engineering and Technology (ISSN : 2322-2441) 

Issue 2, 2012 , pp. 52-57 

© GJSET Publishing, 2012. 

http://www.gjset.org 

An Improved XCSR Classifier System for Data Mining with 

Limited Training Samples 

1 MASOUD SHARIAT PANAHI, 2 NAVID MOSHTAGHI YAZDANI 

1 Associate Professor at Faculty of Mechanical Engineering,²Student of Mechatronics at Master of 

Science Degree 

1 University of Tehran, 2 University of Tehran Kish International Campus 

1 Tehran, 2 Kish Island 

1 IRAN, 2 IRAN 

1 mshariatp@ut.ac.ir, 2 navid.moshtaghi@ut.ac.ir 

Abstract: - The extended classifier system with continuous variables of XCSR which is recognized as 

one of the most successful Learning Agents for solving data mining problems in semi-observable 

environments, updates its rules according to the accuracy in the reward prediction received from 

evaluation environment and by using genetics algorithm. In this system each law consists of three 

characteristics of prediction, prediction error and fitness, the accuracy of which is determined in 

proportion to how much the prediction is close to its reward received from the environment. On the other 

hand, according to the common approach for training XCSR in data mining applications, i.e. using 

supervised learning, only the fitness of a rule is increased that responds a positive answer to training data. 

It means that the chance of each rule for survival and partnership in the process of new rules production 

directly depends on the way of responding training data and to determine this chance realistically requires 

a large number of training data. Since the number of training data is limited in real applications and to 

acquire the more data necessitate lots of time and expenses (for accomplishing empirical tests and similar 

to that) using XCSR in such applications are not often explainable. In this study a new method is 

presented for improving performance and increasing convergence rank of XCSR by means of limited 

training data. The new method is based on multiple use of the existing examples, this means that the 

mentioned data will be changed to new and meaningful data aided by crossover operator after presenting 

the existing training data to system and updating the attributes of its rules which will improve the power 

of generalizing the constituent rules of XCSR by presenting new information about the existing pattern 

and environment in it. The manner of performing the suggested method is indicated via implementing it 

on some benchmark problems and the efficiency of the said method is also evaluated through comparing 

the obtained results with the results existing in other articles (about benchmark problems). 

Key-Words: - Classifier System, XCSR, Data Mining, Genetics Algorithm, Learning Agents 

52


Issue 2, 2012 , pp. 52-57 



1 Preface 

Machine learning is said to a wide range of 

supervised and unsupervised learning algorithms 

which are aiming at preventing from exhaustive 

search of data in the field of data mining and 

replacing this type of time consuming search is 

accomplished by intelligent methods which make it 

possible to simply classify or model their behavior 

via finding existing patterns among data. In the last 

two decades there are so many methods presented 

in the field of data mining in which different types 

of supervised, unsupervised or reinforcement 

learning algorithm are used for such goals as 

recognition and allocation of pattern. From among 

the most successful of these methods we can point 

out the classifier systems. 

In the general state, the classifier systems include a 

set of rules with “if-then” format, each of which 

presents a potential solution for the goal problem. 

This set of law is gradually evaluated by using a 

reinforced learning mechanism and updated in a 

specified intervals aided by a genetic algorithm. In 

the process of this gradual evolution, the system 

learns environmental behavior and then in 

application phase, presents suitable answers to the 

queries propounded by the user. 

The first classifier system was suggested by 

Holland under the title of Learning Classifier 

System (LCS) in 1976. In this system, the value of 

each law was evaluated with an index called 

strength. The strength of a law in proportion with 

the amount of correct response to learning 

examples was increased within the framework of 

reinforced learning criteria and in specified 

intervals, an algorithm of evolution search 

(generally genetics algorithm) was responsible for 

producing new rules and omitting inefficient rules. 

At the end of training phase, this set of rules had 

the relative ability to present acceptable solutions 

when encountering new queries. Yet the successful 

performance of LCs was subject to selection of 

suitable amounts for controlling parameters of the 

system that mainly depends on the experience of 

designer of this system. 

When LCS was generated, the other types of 

classifier systems were recommended among 

which we can mention Extended Classifier 

Systems: XCS. Before introducing XCS in 1995, 

the ability of these systems was very limited in 

achieving suitable answers. But from that time on 

these systems were gradually changed to more 

intelligent and accurate agents and it is now 

believed that XCS and its improved versions are 

able to solve complicated problems with no need to 

adjust the parameters. Having introduced the 

classifier system with continuous variables 

(XCSR), some inherent weaknesses of binary 

classifier systems such as inability in introducing 

specified intervals of variable amounts were largely 

resolved and nowadays, these systems are 

recognized as one of the most successful Learning 

Agents for solving data mining problems in semiobservable 

environments. 

According to the common approach for training 

XCSR, only the fitness of a rule is increased that 

responds a positive answer to training data. It 

means that the chance of each rule for not being 

omitted and participating in process of new rules 

production directly depends on the way of 

responding training data and to determine this 

chance realistically requires a large number of 

training data. Since the number of training data is 

limited in real applications and increasing the 

number of data are simply possible, using XCSR in 

such applications are not usually explainable in 

terms of time and computational expenses. 

In the continue of this study a new method is 

presented for improving performance and 

increasing convergence rank of XCSR by means of 

limited training data. 

2 Introducing Suggested Method 

In the suggested method, firstly the limited set of 

training data is commonly applied for amending 

53


Issue 2, 2012 , pp. 52-57 



characteristics of rules consists of prediction, 

prediction error and fitness. This is done by means 

of the following relation: 

Updating prediction and prediction error 

If expi < 1/β then Pi =Pi + (R-Pi) / expi, 

εi= εi+(|R-Pi|- εi) / expi (1) 

If expi ≥ 1/β then Pi =Pi + β (R-Pi), 

εi= εi+ β (|R-Pi|- εi) (2) 

Updating fitness: 

If εi < ε0 then ki=1 (3) 

If εi ≥ ε0 then ki= β (εi/ε0) –γ (4) 

Fi = fi+ β [(ki/∑kj) – fi] (5) 

In these relations, β is learning rank, γ is power of 

law accuracy, ε is prediction error, exp is law 

experiment, P is law prediction, R is reward 

received from environment, k is law accuracy and f 

is fitness. i index also indicates number of law in 

set of rules. 

In the next phase for developing variety in set of 

data, several couples were selected as parents from 

among the fields that display the part of existing 

data condition using the method of “Stochastic 

selection with remainder”², and new data condition 

section is created using intermediate crossover 

method which are applied on the fields of parents. 

In this method, the quantity of each of the 

conditional variables is obtained from the 

following relation: 

a i = α(a i F )+(1-α)( a i M ) (6) 

in which a i is the quantity of conditional variable of 

i in new data, a F i is the quantity of conditional 

variable i in the first parent (father), a M 

i is the 

quantity of conditional variable of i in the second 

parent (mother) and α is the coefficient of parents 

partnership which are determined in adaptive form. 

New data section performance is also produced 

using a non-linear mapping of conditional variables 

area to area of performance which are created by 

using the existing data. 

Diversifying the existing data continues up to 

where learning stop condition (for example, when 

percent of system correct answers to the test data 

reach to a pre-determined threshold) is satisfied 

aided by completed data. 

In the next chapter, some of the common 

algorithms are summarily defined for supervised 

learning and the results obtained from suggested 

method are studied with the results of these 

methods for several criterion examples which will 

be explained in chapter 4. 

3 Some Common Methods for 

Supervised Learning reface 

In this chapter two methods are propounded for 

supervised learning (learning law from training 

data): “Isolation and Solving” method and “Tree 

and Decision” method. In isolation and solving 

method, one law is learnt in each phase and then 

covered data are isolated by that law and this 

procedure is repeated for the remained samples. K- 

NN algorithm or the nearest neighbor is one of the 

learning algorithms based on samples. This 

algorithm in learning phase only restores the 

educational samples. For specifying class of one 

sample of data, the said algorithm calculates the 

space of this sample with other educational 

samples. The most common criterion for 

calculating such space is Euclidean norm. 

Although such criteria like Manhattan Minofski are 

also used for this purpose. 

After calculating the space, a majority voting is 

held between the nearest educational sample (k) to 

the current test sample and the majority label of 

this sample is allocated to the test sample. K is a 

parameter which is determined by the user. Such 

algorithms are called Lazy Algorithms because 

54


Issue 2, 2012 , pp. 52-57 



they do no special works in learning phase and 

merely restore test samples. 

C4.5 algorithm is one of the most popular 

algorithms of making decision tree [3]. This 

algorithm was presented in 1993 and is an extended 

version of ID3 algorithm. Each intermediate node 

in the tree indicates a test on scales of an attribute 

and each branch also shows one of the authorized 

scales of attributes. The criteria used for selecting 

suitable attribute for a node is Information Gain 

which leads to a bios in favor of attributes with 

several values. To eliminate this problem Gairn 

Ratio criterion is also applied. ID3 algorithm only 

supports discrete attributes, while C4.5 algorithm 

in addition to attributes with discrete values also 

manages [4] continuous attributes. Moreover, 

managing attributes with unspecified values is also 

one of the other advantages of C4.5 over ID3. To 

avoid the over fitness phenomenon in the generated 

classification model, pruning techniques of tree are 

used. The over-fitness phenomenon is happened 

when the generated classification accuracy of 

model is very high on the educational data, but not 

reaches a very high accuracy on the set of test data. 

In other words, over classification model is 

generated in proportion with educational data and 

this high proportion does not necessarily lead to 

higher classification accuracy on the set of test 

data. There are two main techniques for pruning a 

tree. In the first technique which is called Pre- 

Pruning, growth of tree in some paths is stopped 

before completion of tree, while in other technique 

under title of Post-Pruning, first the tree is grown 

completely and then, some of the sub-trees are 

replaced with a leaf node. The generated tree can 

be changed to a set of equal classification rule and 

then accomplishes pruning of rules by omitting 

some of its preconditions 

4 Comparing Results of Suggested 

Algorithm with other Supervised 

Learning Methods 

4.1 Heart Diseases Diagnosis Problem 

Heart diseases are of such complications the 

diagnosis of which is hardly possible. So, some 

systems are needed to help the correct diagnosis of 

diseases in this regard. Several systems were 

recommended in this regard the basis of most of 

which is to use techniques that can discover and 

generalize a relation between data correctly. 

To apply the improved XCS classifier method data 

is needed firstly. Hospital systems are of those 

systems that deal with a large amount of data and 

can provide us information. Machine learning 

storage UCI provide such facilities that data base in 

each research field are accessible and applicable for 

intelligent systems[5]. 

Data base of heart diseases diagnosis in UCI is 

related to Cleveland area and include information 

about disease, patients and their vital signs [1]. In 

the rest of this study we will describe the 

characteristics related to patients in this data base. 

The data base of heart diseases diagnosis existing 

in the center restores information related to 303 

patients each of them has 76 characteristics. Of 

course all 76 characteristics are not applied and 

only 14 of them are useful which we will explain 

them. 

1. Age 

2. Gender: 1= male; and 0= female 

3. Type of chest pain4: 1= Typical Angina. 2= 

Atypical Angina. 3= Non-angina Pain. 4= Without 

sign 

4. Resting blood pressure5: at the time of entering 

to hospital based on MmHg. 

5. Serum Cholesterol (mg/dl) 6 

6. Fasting Blood Sugar (FBS) 7: 1= higher than 

120 mg/dl, 0= lower than 120 mg/dl. 

7. Resting Electrocardiographic Results 8: 

0=normal. 1=unusual ST-T wave (T wave should 

be reverse or ST should be higher than 0.05). 2= 

shrinkage of left ventricle. 

8. Maximum Heart Rate (MHR) 

9. Exercise Induced Angina : 1= Yeas. 0= No. 

55


Issue 2, 2012 , pp. 52-57 



10. ST depression induced by exercise relative to 

rest 

11. the slope of the peak exercise ST segment 

(1= higher incline. 2=smooth. 3= low incline) 

12. number of major vessels colored by 

fluoroscopy ( 0-3) 

13.Thal ( 3= normal. 6= fixed defect. 7= reversible 

defect.) 

14. Heart Diseases Diagnosis: (Angiography of 

Patient’s Status) 0= Narrowing diameter more than 

50%. 1= narrowing diameter less than 50%. The 

characteristic No. 14 is considered as diagnosis 

characteristic [1,2]. UCI data base contains 

information of 303 patients, that we select 253 

observations from this data base as training 

examples and present to the system to training and 

apply 50 other observations for testing trained 

rules. 

After preparing data, firstly we apply limited 

training examples to the generated rules, doing this 

updates rules parameters. Genetics mechanism also 

helps us to generate new rules. At the end of 

training phase a spectrum of heart diseases which 

are presented in form of test data can be predicted 

by using these trained rules. 

In computer code which was codified for 

performing suggested method, the resulted 

obtained from test examples are being stored in a 

vector called answer that this vector is comparable 

with vector of system answers to training data 

(learning-answer) and has the same sides. The 

results of this comparison are shown in the 

following Table 1: 

Table 1: Results of Algorithm Test suggested for 

Heart Patients characteristics existing in UCI Data 

Base 

Number of cases 

which have nondiagnosed 

diseases 

Number of 

cases which 

have nondiagnosed 

diseases 

Answer 1 32 12 

Answer 2 4 2 

As it is seen in Table 1, 32 patients have heart 

diseases that the suggested system diagnosed the 

same thing. Also, 12 patients were diagnosed 

correctly by this system that they do not have heart 

diseases, but 2 patients were wrongly diagnosed 

that they do not have heart disease and 4 patients 

were wrongly diagnosed that they have heart 

disease, and totally for evaluating improved 

classifier systems (XCSR) with calculation we 

reach to 12% error which is an acceptable error for 

this systems, and in comparison with XCSR we 

reach to 9% improvement. 

4.2 Diabetes Disease Diagnosis Problem 

Hospital systems are of those systems that deal 

with a large amount of data and can provide us 

information. Diabetes disease diagnosis data base 

belongs to Diabetes & Kidney and Digestive 

Problems Institute which consists of information 

about disease, patients and their vital signs. In the 

rest of this study we will describe the 

characteristics related to patients in this data base. 

The data base of diabetes disease diagnosis is in 

Diabetes & Kidney and Digestive Problems 

Institute. We will explain information related to 

768 patients each of them has 9 characteristics. 

56


Issue 2, 2012 , pp. 52-57 



Table 2: Diabetes Data Set 

Number of pregnancies 

Plasma glucose concentration 

Diastolic blood pressure (MMHG) 

Triceps skin fold thickness 

2 hours serum insulin 

Body mass index 

Age 

Class variable 

After applying suggested method, the results 

obtained from performing improved XCSR 

algorithm was compared with four other algorithms 

in the following table in test phase. 

[3] Quinlan, J.R. “C4.5: Programs for Machine 

Learning”, Morgan-Kaufmann, San Mateo, 

CA. 1993. 

. 

[4] Sharpe, P.K., Glover, R.P. “Efficient GA 

based technique for classification Applied 

Intelligence 11, 277–284, 1999.uinlan, J.R. 

“C4.5: Programs for Machine Learning”, 

Morgan-Kaufmann, San Mateo, CA. 1993. 

[5] C. Blake. , E. Keogh. , C. J. Merz ;” UCI 

repository of machine learning databases of 

California” ” 

Table 3: Results obtained from performing 

suggested algorithm with classification previous 

algorithms 

K 

star 

C4.5 SVM AD 

Tree 

XCS SUGGESTED 

METHOD 

70.2 71.3 77.8 73.1 87.19 89.67 

5 Conclusion 

In this study a new method is presented for 

improving performance and increasing 

convergence rank of XCSR by means of limited 

training data. To show the efficiency of suggested 

system in data mining problems, two criterion 

problems “Diabetes Diagnosis” and “Diagnosis of 

Type of Heart Disease” were studied aided by this 

system. The comparison of results obtained from 

suggested method with some other machine 

learning algorithms shows the advantage of this 

method in proportion to the said methods. 

References: 

[1] A. Saeidi, Vahid; Yousefian, Alireza; 

Shahrabi, Jamal; “Heart Diseases 

Diagnosis by means of Data Mining 

Techniques”, The 5th Iran Data Mining 

Conference 2011.. 

[2] http://archive.ics.uci.edu/ml/datasets/Heart 

+Disease. 

57

An Improved XCSR Classifier System for Data Mining ... - Gjset.org

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?