27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

to d (true negatives), w hile F APR in the reference to a (true<br />

negatives).<br />

The presented above fitness functions are defined fo r a<br />

binary (two-category) classi fication problem. In order to<br />

focus on a single class, a speci al processing of confusion<br />

matrix is performed during evaluation of models. This<br />

processing means collapsing a multi-class confusion matrix<br />

into a tw o-class matrix. In such a case, the multi-class<br />

matrix is being trans formed into a num ber (equal to the<br />

number of classes) of matrixes. Each of these matrixes is<br />

related to two classes: a class of interest – let say a class A ,<br />

and the other class – let say a class non-A – obtained by<br />

fusion of all other classes. Using these two classes, all<br />

classification rates (accuracy , sensitivity, specificity and<br />

precision) are calculated.<br />

As it has been explain earlier, each m odel is evaluated<br />

by a set of objectives – a set of fitness functions where each<br />

of them is related to a single class. E ach model is<br />

“measured” by a set of fitness function:<br />

<br />

<br />

<br />

<br />

or<br />

<br />

<br />

<br />

<br />

The proposed m ethod is used to construct FNNs therefore<br />

there is one more thing that need to be explained – how the<br />

classification is determine based on the fuzzy output. In<br />

order to cope w ith that, a m ethod with a Threshold Value<br />

(Section II) is used here. If the output of model associated<br />

with one of the classes is above the threshold then it<br />

indicates that a given data point belongs to this class. It if<br />

happens that two outputs are above the threshold – the<br />

model indicates that the point should belong to both classes.<br />

This result is seen as misclassification.<br />

D. FNN and MOO: Pruning and Merging Processes<br />

The Pareto-based optimization process generates a<br />

number of models that form a Pareto surface. Such an<br />

optimization process is repeated w ith different fitness<br />

functions and more models are generated. A ll these models<br />

are used in a process of knowledge – rules – extraction.<br />

Before the rules extraction takes place, a pruning process of<br />

models is performed. Its purpose is to reduce complexity of<br />

the models by decreasing a number of input variables to<br />

AND/OR neurons.<br />

Model pruning is carried out by changing the thresholds<br />

for OR and AND neurons. For an AND neuron, the input<br />

variable has significant influe nce on the final output when<br />

the weight for this variable to the neuron is close to 0. For<br />

an OR neuron, the input has st ronger effect on the output if<br />

the weight is closer to 1. So, for an AND neuron, the inputs<br />

with weight values close to 1 will not contribute much to the<br />

output. The same happens on an OR neuron if the weights<br />

values are close to 0. If we change the threshold values to<br />

close to1 for AND neurons, and close to 0 for OR neurons,<br />

we could find inputs that are not very useful. By changing<br />

threshold values, we could cut off some non-significant<br />

inputs and simply model, possibly with little change to the<br />

accuracy of the model. In our experiments, we change the<br />

threshold for AND and OR neur ons separately, while<br />

ensuring the output accuracy will not drop more than 5%.<br />

After we trim models for each class, we extract rules,<br />

select some of them, and combine the selected rules together<br />

into a single m odel for classification of all classes. In the<br />

selection process, we compare outputs of all rule sets<br />

(models 1 ) constructed for each class . The rule set that<br />

provides the best classificatio n rate for a single class is<br />

considered as the part of the final model . The process is<br />

performed for each class.<br />

V. DATA DESCRIPTION<br />

An experiment has been perform ed to generate software<br />

data required for illustration of the proposed approach to<br />

construct FNN models. In the experiment, objects of the<br />

system EvIdent [12] have been independently analyzed by<br />

three software architects and r anked according to their<br />

quality attributes: complexity, maintainability and usability.<br />

Quantitative software measures of these objects have been<br />

compiled.<br />

A. Software System Description<br />

EvIdent is a user-friendly, algorithm -rich, graphical<br />

environment for the detection, investigation, and<br />

visualization of novelty and/or change in a set of images as<br />

they evolve in tim e or frequency. It is w ritten in Java and<br />

C++ and based on VIStA, an application-programming<br />

interface (API) developed at the National Resea rch Council.<br />

The VIStA API is written in Java and offers a generalized<br />

data model, an extensible al gorithm framework, and a suite<br />

of graphical user interface constructs.<br />

B. Data Set<br />

Only Java-based EvIdent/VIStA objects have been used<br />

here. For each of the 366 software objects, three<br />

programmers, named ‘A’, ‘D’ and ‘V’, w ere asked to<br />

independently rank objects’ maintainability, complexity and<br />

usability from 1 (very poor) to 5 (very good). At the sam e<br />

time, a set of 64 softw are metrics was calculated for each<br />

object. As the result, the collected data set consists of 366<br />

data points represented by a set of 64 software metrics and<br />

the values assigned to each point by three programmers.<br />

For the purpose of the experiments presented in the<br />

paper, we have combined rankings (objects) 1 and 2 into the<br />

class1, have renamed rank (objects) 3 into the class2, and<br />

have combined rankings (objects) 3 and 4 into the class3.<br />

Despite this, the objects are very unevenly distributed<br />

among the three classes. All three programmers have<br />

identified most of the objects as belonging to the class3.<br />

Using the “standard” approach to construct m odels – best<br />

overall classification rate – the m odels would “concentrate”<br />

on the class3 ignoring the class1 and class2. However, in<br />

the case of software engineer ing applications the most<br />

important are class1 and class2, and rules generated for<br />

them. Objects of these classes need to recognized and better<br />

understood.<br />

1 In the case of an FNN the terms “rule set” and “model” are exchangeable.<br />

The FNN is de facto a set of rules.<br />

109

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!