29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.1. Knowledge discovery and data mining<br />

Figure 1.2: Schematic diagram of the steps involved in the KD process (extracted from<br />

(Fayyad, 1996)).<br />

It is a commonplace that potentially useful and beneficial information patterns lie in<br />

digital data repositories awaiting analysis (Witten and Frank, 2005). However, the concept<br />

of analysis and its goals are highly local to the context it is applied (Fayyad, 1996).<br />

Typical application scenarios can be as disparate as i) mining records of buyers’ choices<br />

for creating marketing campaigns adapted to distinct customer profiles (Witten and Frank,<br />

2005), ii) analyzing credit card transactions history of bank customers so as to detect possible<br />

fraudulent operations from unauthorized users (Fayyad, 1996) or iii) locating and<br />

cataloging geologic objects of interest in remotely sensed images of planets or asteroids<br />

(Fayyad, Piatetsky-Shapiro, and Smyth, 1996).<br />

Thus, be it either economic or scientific, there exists a great interest in replacing (or, at<br />

least, augmenting) human analytic capabilities by computer-based means. The field of computer<br />

science devoted to the extraction of useful patterns from data has been given different<br />

names in the literature, such as information discovery, information harvesting or data archaeology<br />

(Fayyad, Piatetsky-Shapiro, and Smyth, 1996), being knowledge discovery2 (KD)<br />

and data mining (DM) the two most common denominations.<br />

However, the use of KD and DM as synonymous concepts has been a matter of dispute in<br />

the research community (Klosgen, Zytkow, and Zyt, 2002): while deemed equivalent by some<br />

authors (Witten and Frank, 2005), others refer to KD as the whole process of extracting<br />

knowledge from data, defining DM as the central constituting step of KD processes (Fayyad,<br />

Piatetsky-Shapiro, and Smyth, 1996), as depicted in figure 1.2.<br />

According to this latter standpoint (to which we adhere in this thesis), KD is defined<br />

as the ‘non-trivial process of identifying valid, novel, useful and ultimately understandable<br />

patterns in data’, whereas DM is ‘the application of specific algorithms for extracting patterns<br />

from data’ (Fayyad, Piatetsky-Shapiro, and Smyth, 1996). By ‘extracting patterns<br />

patterns from data’ we refer to making any high-level description of a set of data, e.g. fitting<br />

a model to data or finding structure from it (Fayyad, Piatetsky-Shapiro, and Smyth,<br />

1996). Thus, according to this point of view, KD and DM constitute what could be called<br />

2 Although this discipline was originally named KDD —for Knowledge Discovery in Databases (Piatetsky-<br />

Shapiro, 1991)— in this work we assume that operations are conducted on a flat file extracted from the<br />

database, i.e. we remove the second D in KDD and focus on the knowledge discovery process.<br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!