29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.1. Knowledge discovery and data mining<br />

Verrification<br />

Goodness of<br />

fit<br />

HHypothesis<br />

teesting<br />

Analysis oof<br />

variancee<br />

Data miining<br />

methoods<br />

Prediction<br />

Classific cation<br />

Regres ssion<br />

Discovery<br />

Description<br />

Clusteriing<br />

Summarization<br />

Figure 1.3: Taxonomy of data mining methods (adapted from (Maimon and Rokach, 2005)).<br />

Next, depending on the goals of the KD process, a suitable data mining task must<br />

be chosen. According to the taxonomy presented in figure 1.3, data mining methods can<br />

be classified into two main groups: verification-oriented and discovery-oriented (Fayyad,<br />

Piatetsky-Shapiro, and Smyth, 1996). In verification-oriented DM, the role of the system<br />

is to evaluate an hypothesis proposed by the user, and this task is usually accomplished<br />

by means of traditional statistical methods. In discovery-oriented data mining, the goal of<br />

the system is to discover useful patterns in the data. In this work, we focus on this latter<br />

family of DM tasks.<br />

Among discovery-oriented data mining methods, one can distinguish between prediction<br />

and description DM tasks. In prediction tasks, the goal of the system is to build<br />

a behavioral model upon the data, whereas in description tasks, the system aims to find<br />

human-understandable patterns that facilitate knowledge extraction from data.<br />

According to figure 1.3, there exist two main prediction DM tasks: classification and<br />

regression. In classification problems, the goal is to learn a mapping between the categories<br />

in a known taxonomic scheme and a set of pre-classified objects, so that any unseen object<br />

can be categorized into any of these predefined classes. The aim of regression tasks is to<br />

learn a function that maps unseen data objects to a real-valued prediction variable.<br />

As aforementioned, description-oriented DM methods focus on finding understandable<br />

representations of the underlying structure of the data (Maimon and Rokach, 2005). One of<br />

the most common descriptive DM tasks is clustering, which consists of identifying a finite<br />

set of categories to describe the data with no previous knowledge (i.e. deriving a taxonomy<br />

solely from the data). Another description-oriented data mining task is summarization,<br />

whose aim is to find a compact description for a subset of data. To do so, summarization<br />

techniques often make use of multivariate visualization methods.<br />

Once the data mining task that fits the goals of the KD process is identified, there<br />

comes the time to select the specific data mining algorithm to be applied. This selection<br />

must take into account not only which models and parameters are the most appropriate<br />

from an algorithmic viewpoint, but also the desired level of accuracy, utility, and intel-<br />

6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!