27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

uilding FNNs that satisfy different requirements regarding<br />

classification rates.<br />

C. FNN and MOO: Optimization Process<br />

A multi-objective optimization process is used here in<br />

such a way that a classification rate for a single class is a<br />

single optimization objective. Therefore, a number of<br />

objectives is equal to the num ber of classes. As the result of<br />

such an optimization process, a set of solutions – data<br />

models – is obtained. T hese models constitute a Pareto<br />

surface.<br />

Representation of Network Structures: An important aspect<br />

of application of evolutionary-based optimization to<br />

construct FNNs is to m ap a structure of an FNN to a<br />

chromosome string. Two asp ects are important here: a<br />

selection of attributes th at constitute inputs to AND/OR<br />

nodes, and an adjustment of c onnection weights. Therefore,<br />

a chromosome is built out of two segments:<br />

- the segment that is “responsible” for selection of<br />

attributes that become inputs to the neurons, let’s call it a<br />

variable indexes part;<br />

- the segment that determ ines connection w eights – let’s<br />

call it a connection weights part.<br />

Each position of the variable indexes part is an integer that<br />

identifies the attributes’ index. T he values of these integers<br />

are in the range from 0 to n-1 (for n-dimensional data). The<br />

length of this segm ent is determined by the number of<br />

classes c (equal to the number of OR/ AND nodes) i n the<br />

dataset, and the maximum number of inputs to a single<br />

OR/AND node – i. The overall length of the segment<br />

variable indexes part is c*i. The connection weights part is<br />

a string of floating point numbers. Each number represents a<br />

single connection weight and can be in the range from 0 to<br />

1. The number of connection weights for FNN is<br />

(2*i+2*c)*c.<br />

As it can be seen, the structure of the FN N is partially<br />

fixed – i t contains so many OR/ AND nodes as di fferent<br />

classes. It means each output of the FNN is associated with<br />

a single class.<br />

Optimization Objective Functions: The objective of the<br />

optimization process is to construct FNNs that provide the<br />

best classification rates for a single class. This m eans that<br />

selection of the objective function that governs the<br />

optimization process is important.<br />

For any two-category classifi cation process, a confusion<br />

or error matrix can be built, Table I. This matrix summarizes<br />

classification capabilities of a m odel: values a (true<br />

positives) and d (true negatives) represent proper<br />

classifications, while b (false positive) and c (false negative)<br />

misclassifications.<br />

Based on this matrix a number of different measures can be<br />

calculated. The measures that are used in our fitness<br />

functions are:<br />

a + d<br />

accuracy =<br />

a + b + c + d<br />

that defines the ratio of all corr ectly classified data points to<br />

all data points,<br />

sensitivity (recall) =<br />

a<br />

a + c<br />

that represents the percent of actual positive data points<br />

classified as positive data points,<br />

specificity =<br />

d<br />

b + d<br />

that represents the percent of actual negative data points<br />

classified as negative data points, and<br />

precision =<br />

a<br />

a + b<br />

that is the ratio of actual positive data points classified as<br />

positive to all data point classified as positive . It can be said<br />

that sensitivity and specificity are somehow reciprocal to<br />

each other. The sensitivity is for positive data points, w hile<br />

specificity is its equivalent for negative data points. Another<br />

observation is related to sensitivity (called also recall) and<br />

precision. Higher sensitivity means that alm ost all of the<br />

positive data points w ill be included in the classification<br />

results. However, at the sam e time, some negative data<br />

points can be predicted as positive ones, w hat leads to low<br />

values of precision.<br />

Table I. Confusion matrix<br />

actual POSITIVE NEGATIVE<br />

predicted<br />

POSITIVE a b<br />

NEGATIVE c d<br />

Of course, accuracy is a very important measure – it<br />

means that the classifier is able to properly classify positive<br />

and negative data points. However, in the case of large<br />

imbalance in a number of data points that belong to each<br />

class, accuracy measure is not able to assure a high<br />

classification rate for each class. If 90 per cent of data points<br />

belong to the class negative and only 10 per cent to the class<br />

positive, then high accuracy can be achieved by correct<br />

classification of the class negative only. At the same time,<br />

this can lead to large misclassification (large b and c) for the<br />

class positive. Therefore, there is a need to use som e other<br />

measures that “take care” of large b and c.<br />

As the result of these investigations, fitness functions<br />

used here are constructed based on accuracy, sensitivity,<br />

specificity and precision. Two fitness functions are defined:<br />

- one that uses a product of sensitivity, specificity and<br />

accuracy (FF SSA )<br />

F SSA<br />

= Sensitivity * Specificity * Accuracy<br />

- one that uses a product of accuracy, recall and precision<br />

– (FF APR )<br />

F APR<br />

= Accuracy * Recall * Precision<br />

Each of the fitness functions represents different way of<br />

suppressing b (false positives): F SSA does it in the reference<br />

108

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!