29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Appendix A. Experimental setup<br />

generated by applying the following well-known feature extraction techniques 2 : Principal<br />

Component Analysis (PCA), Independent Component Analysis (ICA), Non-Negative Matrix<br />

Factorization (NMF) and Random Projection (RP). Besides providing diversity as<br />

regards data representation, these techniques are also employed with dimensionality reduction<br />

purposes. In this work, we refer to the reduced dimensionality of the resulting feature<br />

space by r, which takes a whole range of values in the interval [3,d]. As a result of each<br />

feature extraction procedure, a batch of r × n matrices (X r PCA , Xr ICA , Xr NMF or Xr RP )are<br />

obtained.<br />

The following paragraphs are devoted to a brief description of the main concepts regarding<br />

the aforementioned feature extraction techniques.<br />

– Principal Component Analysis, which is one of the most typical feature extraction<br />

techniques, is based on projecting the data onto a dimensionally reduced feature space<br />

such that i) the newly obtained features are decorrelated, and ii) thevarianceofthe<br />

original data is maximally retained. For these reasons, PCA is said to be capable of<br />

removing data redundancies while keeping the most relevant information contained<br />

in the data. There exist several ways for conducting PCA, from the eigenanalysis of<br />

the covariance matrix of X (Jolliffe, 1986) to neural network approaches (Oja, 1992).<br />

In this work, PCA is implemented by means of Singular Value Decomposition (SVD),<br />

following a similar approach to that of <strong>La</strong>tent Semantic Analysis (Deerwester et al.,<br />

1990). More specifically, given the d×n data matrix X, its SVD is expressed according<br />

to equation (A.1).<br />

X = U · Σ · V T<br />

(A.1)<br />

Matrix Σ contains the singular values of X ordered in decreasing order and the<br />

columns of matrices U and V are the left and right singular vectors of X, respectively.<br />

Dimensionality reduction is achieved by retaining the r largest singular values in Σ<br />

and the corresponding columns of matrix V, so that the r ×n matrix Xr T<br />

PCA = ΣrVr<br />

–where Σr and Vr are the reduced version of the singular values and right singular<br />

vectors matrices, respectively– will contain the location of the n objects in the<br />

r-dimensional PCA space, where clustering is conducted.<br />

– Independent Component Analysis (ICA) can be regarded as an extension of PCA<br />

for non-Gaussian data (Xu and Wunsch II, 2005), in which the projected data components<br />

are forced to be statistically independent—a stronger condition than PCA’s<br />

decorrelation. Being tightly bound to the blind source separation problem (Hyvärinen,<br />

Karhunen, and Oja, 2001), the application of ICA for feature extraction usually assumes<br />

the existence of a generative model that, in its simplest version, defines the<br />

observed data as the result of an unknown linear noiseless combination of r statistically<br />

independent latent variables (the so-called independent components). The<br />

goal of ICA algorithms is to recover the independent components making no further<br />

2 We choose applying feature extraction over feature selection given its greater ease of application (Jain,<br />

Murty, and Flynn, 1999), as our main goal is creating representational diversity, not elaborating on object<br />

representations. In informal experiments not reported here, other object representations based on feature<br />

selection plus change of basis (Srinivasan, 2002) were also tested but finally discarded as, in general terms,<br />

gave rise to lower quality clustering results.<br />

225

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!