29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A.3. Data representations<br />

assumption than their statistical independence. The application of ICA in feature extraction<br />

is usually preceded by PCA with dimensionality reduction, as this procedure<br />

is equivalent to the usual whitening step that simplifies ICA algorithms (Hyvärinen,<br />

Karhunen, and Oja, 2001). Applying ICA on the PCA data yields an estimation of<br />

the r independent latent variables which generated the observed data:<br />

X r ICA = WX r PCA<br />

(A.2)<br />

where matrix W is known as the separating matrix. Equation (A.2) can be interpreted<br />

as a linear transformation of the data through its projection on the basis vectors<br />

contained in the rows of the separating matrix W. In this work, a version of the<br />

FastICA algorithm (Hyvärinen, 1999) that maximizes the skewness of the data is<br />

employed for obtaining the ICA representation of the data (Kaban and Girolami,<br />

2000).<br />

– Non-Negative Matrix Factorization (NMF) (Lee and Seung, 1999) is a feature extraction<br />

technique based on linear representations of non-negative data—i.e. NMF can<br />

only be applied when the original representation of the data is non-negative. Intuitively,<br />

NMF can be interpreted as a linear generative model somewhat similar to that<br />

of ICA but subject to non-negativity constraints, as it factorizes the non-negative data<br />

matrix X into the approximate product of two non-negative matrices W and H, as<br />

defined in equation (A.3). Thus, it can be argued that the data set is generated by<br />

the sum of a set of the latent non-negative variables contained in matrix H, while the<br />

elements of W are the weights of their linear combination.<br />

X ≈ WH ,whereX r NMF<br />

= H (A.3)<br />

From a practical viewpoint, i) NMF is usually implemented by means of iterative<br />

algorithms which try to minimize a cost function proportional to the reconstruction<br />

error ||X−W·H|| (Lee and Seung, 2001), and ii) dimensionality reduction is achieved<br />

by setting the respective sizes of the factorization matrices W and H to d × r and<br />

r × n at the time of computing the approximate factorization of equation (A.3).<br />

In this work, the NMF-based object representation XNMF is obtained by applying a<br />

mean square reconstruction error minimization algorithm from NMFPACK, a software<br />

package for NMF in Matlab (Hoyer, 2004). Besides its use as a feature extraction<br />

technique, the vision of NMF as a means for obtaining a parts-based description of<br />

the data has motivated alternative NMF-based clustering strategies (Xu, Liu, and<br />

Gong, 2003; Shahnaz et al., 2004), alongside studies on the theoretical connections<br />

between NMF and classic clustering approaches (Ding, He, and Simon, 2005).<br />

Compared to PCA and ICA, NMF is advantageous as the non-negativity of its basis<br />

vectors favours their interpretability. On the flip side, the derivation of the NMF<br />

representation is usually more computationally demanding.<br />

– Random Projection (RP) is a computationally efficient dimensionality reduction technique,<br />

proposed as an alternative to those feature extraction techniques that become<br />

too costly when the dimensionality of the original representation (d) isveryhigh<br />

(Kaski, 1998). The rationale behind RP is pretty straightforward: a dimensionality<br />

reduction method is effective as long as the distance between the objects in the<br />

226

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!