SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute


You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

to d (true negatives), w hile F APR in the reference to a (true<br />

negatives).<br />

The presented above fitness functions are defined fo r a<br />

binary (two-category) classi fication problem. In order to<br />

focus on a single class, a speci al processing of confusion<br />

matrix is performed during evaluation of models. This<br />

processing means collapsing a multi-class confusion matrix<br />

into a tw o-class matrix. In such a case, the multi-class<br />

matrix is being trans formed into a num ber (equal to the<br />

number of classes) of matrixes. Each of these matrixes is<br />

related to two classes: a class of interest – let say a class A ,<br />

and the other class – let say a class non-A – obtained by<br />

fusion of all other classes. Using these two classes, all<br />

classification rates (accuracy , sensitivity, specificity and<br />

precision) are calculated.<br />

As it has been explain earlier, each m odel is evaluated<br />

by a set of objectives – a set of fitness functions where each<br />

of them is related to a single class. E ach model is<br />

“measured” by a set of fitness function:<br />

<br />

<br />

<br />

<br />

or<br />

<br />

<br />

<br />

<br />

The proposed m ethod is used to construct FNNs therefore<br />

there is one more thing that need to be explained – how the<br />

classification is determine based on the fuzzy output. In<br />

order to cope w ith that, a m ethod with a Threshold Value<br />

(Section II) is used here. If the output of model associated<br />

with one of the classes is above the threshold then it<br />

indicates that a given data point belongs to this class. It if<br />

happens that two outputs are above the threshold – the<br />

model indicates that the point should belong to both classes.<br />

This result is seen as misclassification.<br />

D. FNN and MOO: Pruning and Merging Processes<br />

The Pareto-based optimization process generates a<br />

number of models that form a Pareto surface. Such an<br />

optimization process is repeated w ith different fitness<br />

functions and more models are generated. A ll these models<br />

are used in a process of knowledge – rules – extraction.<br />

Before the rules extraction takes place, a pruning process of<br />

models is performed. Its purpose is to reduce complexity of<br />

the models by decreasing a number of input variables to<br />

AND/OR neurons.<br />

Model pruning is carried out by changing the thresholds<br />

for OR and AND neurons. For an AND neuron, the input<br />

variable has significant influe nce on the final output when<br />

the weight for this variable to the neuron is close to 0. For<br />

an OR neuron, the input has st ronger effect on the output if<br />

the weight is closer to 1. So, for an AND neuron, the inputs<br />

with weight values close to 1 will not contribute much to the<br />

output. The same happens on an OR neuron if the weights<br />

values are close to 0. If we change the threshold values to<br />

close to1 for AND neurons, and close to 0 for OR neurons,<br />

we could find inputs that are not very useful. By changing<br />

threshold values, we could cut off some non-significant<br />

inputs and simply model, possibly with little change to the<br />

accuracy of the model. In our experiments, we change the<br />

threshold for AND and OR neur ons separately, while<br />

ensuring the output accuracy will not drop more than 5%.<br />

After we trim models for each class, we extract rules,<br />

select some of them, and combine the selected rules together<br />

into a single m odel for classification of all classes. In the<br />

selection process, we compare outputs of all rule sets<br />

(models 1 ) constructed for each class . The rule set that<br />

provides the best classificatio n rate for a single class is<br />

considered as the part of the final model . The process is<br />

performed for each class.<br />


An experiment has been perform ed to generate software<br />

data required for illustration of the proposed approach to<br />

construct FNN models. In the experiment, objects of the<br />

system EvIdent [12] have been independently analyzed by<br />

three software architects and r anked according to their<br />

quality attributes: complexity, maintainability and usability.<br />

Quantitative software measures of these objects have been<br />

compiled.<br />

A. Software System Description<br />

EvIdent is a user-friendly, algorithm -rich, graphical<br />

environment for the detection, investigation, and<br />

visualization of novelty and/or change in a set of images as<br />

they evolve in tim e or frequency. It is w ritten in Java and<br />

C++ and based on VIStA, an application-programming<br />

interface (API) developed at the National Resea rch Council.<br />

The VIStA API is written in Java and offers a generalized<br />

data model, an extensible al gorithm framework, and a suite<br />

of graphical user interface constructs.<br />

B. Data Set<br />

Only Java-based EvIdent/VIStA objects have been used<br />

here. For each of the 366 software objects, three<br />

programmers, named ‘A’, ‘D’ and ‘V’, w ere asked to<br />

independently rank objects’ maintainability, complexity and<br />

usability from 1 (very poor) to 5 (very good). At the sam e<br />

time, a set of 64 softw are metrics was calculated for each<br />

object. As the result, the collected data set consists of 366<br />

data points represented by a set of 64 software metrics and<br />

the values assigned to each point by three programmers.<br />

For the purpose of the experiments presented in the<br />

paper, we have combined rankings (objects) 1 and 2 into the<br />

class1, have renamed rank (objects) 3 into the class2, and<br />

have combined rankings (objects) 3 and 4 into the class3.<br />

Despite this, the objects are very unevenly distributed<br />

among the three classes. All three programmers have<br />

identified most of the objects as belonging to the class3.<br />

Using the “standard” approach to construct m odels – best<br />

overall classification rate – the m odels would “concentrate”<br />

on the class3 ignoring the class1 and class2. However, in<br />

the case of software engineer ing applications the most<br />

important are class1 and class2, and rules generated for<br />

them. Objects of these classes need to recognized and better<br />

understood.<br />

1 In the case of an FNN the terms “rule set” and “model” are exchangeable.<br />

The FNN is de facto a set of rules.<br />


Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!