27.06.2013 Views

6th European Conference - Academic Conferences

6th European Conference - Academic Conferences

6th European Conference - Academic Conferences

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Suzan Arslanturk et al.<br />

valued dataset. Different types of syntactic functions are applied to the dataset to generate the class<br />

labels. Since it is a fully controlled scenario varying number of relevant, irrelevant and redundant features<br />

are placed in the dataset. However, the simulated dataset doesn’t lend itself to real longitudinal dataset of<br />

clinical trials and the functions used in this paper does not care about any particular attribute when<br />

deciding about the resultant, which does not fit to what we usually see in real world biomedical datasets.<br />

In this research, on the other hand, several feature selection algorithms are applied to a simulation based<br />

dataset with longitudinal trials in order to compare the performances of the algorithms in terms of different<br />

noise levels and different missing value levels and combination of both. Multicollinearity is added to the<br />

dataset to evaluate the most robust algorithms when dependencies between attributes are in question.<br />

Same attribute selection techniques are also applied to the combination of the longitudinal datasets. A<br />

simulation dataset is chosen since full control over the dataset is achieved.<br />

This paper is organized as follows. In Section 2 we explain different attribute selection techniques.<br />

Section 3 outlines how the database is created and how the rules are embedded into the dataset. In<br />

Section 4 the programming language used to embed the rules and the data mining tool to perform the<br />

feature selection algorithms are explained. Section 6 and Section 7 present the results.<br />

2. Attribute selection techniques<br />

Attribute selection techniques can be categorized into three different categories: filter, wrapper and<br />

embedded methods. In filtering methods, the feature selection method takes place before any learning<br />

algorithm. The undesirable attributes are filtered out before the classification step. All the training data is<br />

used in filtering methods (Hall 2003). In embedded methods, the learning algorithm has its own feature<br />

selection algorithm embedded in it (Molina 2002). J48 decision tree classification algorithm is a common<br />

example of an embedded method. In wrapper mode, on the other hand, the feature selection algorithm<br />

uses the learning algorithm as a sub-routine (John 1994).<br />

Five different attribute selection methods are applied to the dataset. Wrapper methods which are<br />

correlation based feature selection and Relief, a filtering method, information gain and an embedded<br />

method J48 decision tree based feature selection are applied to the dataset and the results are compared<br />

in terms of sensitivity and specificities.<br />

2.1 Correlation based feature selection<br />

Correlation based feature selection evaluates the dependencies between attributes and eliminate the<br />

ones which are correlated to each other. The irrelevant and redundant data has to be removed as much<br />

as possible. After the feature selection, the remaining data has to be highly correlated with the class and<br />

uncorrelated with each other.<br />

As equation 1 (Ghiselli, 1964) formalizes :<br />

Merits = (1)<br />

The feature subset S contains k different features where rfc is the feature to class correlation and rff is the<br />

feature to feature correlation. In order to have a good feature selection algorithm the merit has to be<br />

maximized.<br />

Symmetrical uncertainty can be evaluated as follows where H(X) and H(Y) are marginal entropies.<br />

Symmetrical uncertainty = 2.0 x (2)<br />

2.2 Consistency based feature selection<br />

The consistency of the class is evaluated by first figuring out all different combinations of the attributes.<br />

For each different combination the consistency is calculated by differentiating the number of occurrences<br />

of a particular attribute from the cardinality of the majority class (Hall 1998).<br />

19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!