29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A.2.1 Unimodal data sets<br />

Appendix A. Experimental setup<br />

A total of twelve unimodal data sets have been used in the experimental sections of this<br />

thesis. Unless noted otherwise, these data sets have been obtained from two classic public<br />

data repositories for the data mining and machine learning research communities such as the<br />

UCI Knowledge Discovery in Databases Archive (Hettich and Bay, 1999) and the UCI Machine<br />

Learning Repository (Asuncion and Newman, 1999). In the following paragraphs, we<br />

present a brief description of each data set, summarizing their most relevant characteristics<br />

in table A.2 as a quick reference source.<br />

1. Zoo: the goal of this data set is to learn to classify animals into seven classes given 17<br />

binary attributes representing features such as the presence of hair, feathers, backbone<br />

or teeth, or whether it is an aquatic or airborne animal, among others. The number<br />

of objects (i.e. animals) in the data set is 101.<br />

2. Iris: a classic data set in machine learning and pattern recognition. It contains 150<br />

objects (instances of Iris plants) represented by four real-valued features measuring<br />

the width and length of its petals and sepals. The goal is to classify the objects into<br />

one of the three classes of Iris plants, one of which is linearly separable from the<br />

others, while the latter two are not linearly separable from each other.<br />

3. Wine: this data set’s goal is to determine the origin of wines by means of chemical<br />

analysis. It contains 178 samples of wine which must be categorized into three wine<br />

classes based on their contents of 13 constituents such as alcohol, malic acid, or<br />

magnesium, represented as real-valued features.<br />

4. Glass: in this data set, 214 instances of glass are represented by 10 real-valued attributes<br />

corresponding to their contents in chemical elements such as aluminium,<br />

sodium, calcium, etc. The goal is to classify the objects into one of the predefined six<br />

categories (types of glass).<br />

5. Ionosphere: the contents of this data set are 351 radar returns from the ionosphere,<br />

classified either as good or bad depending on whether they show evidence of some<br />

type of structure in the ionosphere or not. Each radar return is described by 34<br />

autocorrelation-based real-valued features.<br />

6. WDBC : its complete name is Wisconsin Diagnostic Breast Cancer data set. It contains<br />

569 objects (breast mass images) represented by 32 real-valued features describing<br />

characteristics of the cell nuclei present in the image (radius, texture, perimenter,<br />

etc.). The goal is to classify these objects into one of the possible cancer diagnostics<br />

(malignant or benign).<br />

7. Balance: this data set was generated to model psychological experimental results.<br />

Each of the 625 objects is classified into three classes (as having the balance scale tip<br />

to the right, tip to the left, or balanced). The integer-valued attributes are the left<br />

weight, the left distance, the right weight, and the right distance.<br />

8. Mfeat: its original name is Multiple Features data set, as it represents the objects<br />

it contains (handwritten numerals from 0 to 9) using different real-valued features<br />

such as Fourier coefficients, profile correlations, Karhunen-Loève coefficients, pixel<br />

averages, Zernike moments and morphological attributes.<br />

221

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!