29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Appendix A. Experimental setup<br />

original feature space is approximately preserved in the reduced r-dimensional space.<br />

Allowing for this fact, Kaski proved that this could be achieved using a random linear<br />

mapping embodied in a r × d random projection matrix R, where the columns of R<br />

are realizations of independent and identically distributed (i.i.d.) zero-mean normal<br />

variables, scaled to have unit length (Fodor, 2002).<br />

X r RP<br />

= RX (A.4)<br />

Several experimental studies bear witness of the fact that i) RP takes a fraction of<br />

the time required for executing other feature extraction techniques as PCA or ICA,<br />

among others, and ii) clustering results on RP data representations are sometimes<br />

comparable or even better than those obtained using, for instance, PCA —which<br />

somehow reinforces the notion of the data representation indeterminacy outlined in<br />

section 1.4 (Kaski, 1998; Bingham and Mannila, 2001; Lin and Gunopulos, 2003; Tang<br />

et al., 2005).<br />

A.3.2 Multimodal data representations<br />

As regards the representation of the objects of the multimodal data sets described in section<br />

A.2.2, two distinct approaches have been followed. Firstly, unimodal representations have<br />

been created for each mode separately, applying the same strategies as in the unimodal<br />

data sets—thus, we will not expand on this point. And secondly, we have generated truly<br />

multimodal data representations by combining both modalities. We elaborate on this latter<br />

issue in the following paragraphs.<br />

The simple concatenation of the baseline feature vectors of both modalities (previously<br />

normalized to unit length3 ) gives rise to the multimodal baseline representation, represented<br />

on a (d1 + d2)-dimensional attribute space —where d1 and d2 are the dimensionalities of<br />

the baseline representation of each modality.<br />

Subsequently, the feature extraction techniques described in section A.3.1 (i.e. PCA,<br />

ICA, RP and NMF—this latter only when data is non-negative) are applied on the multimodal<br />

baseline representation, yielding representations of dimensionalities r ∈ [3,d1 + d2].<br />

This procedure, known as early fusion or feature-level fusion in the literature, is a common<br />

strategy for creating representations of multimodal data from unimodal representations,<br />

and it has been applied in content-based image retrieval (<strong>La</strong> Cascia, Sethi, and Sclaroff,<br />

1998; Zhao and Grosky, 2002), semantic video analysis (Snoek, Worring, and Smeulders,<br />

2005), human affect recognition (Gunes and Piccardi, 2005), audiovisual video sequence<br />

analysis (Sevillano et al., 2009), besides multimodal clustering (Benitez and Chang, 2002).<br />

A.4 Cluster ensembles<br />

In this section, we briefly describe the cluster ensembles employed in the experimental<br />

sections of this thesis. As described in section 2.1, in this work we combine both the homogeneous<br />

and heterogeneous approaches for creating cluster ensembles. This means that we<br />

3 An uneven weighting of the concatenated vectors would give more importance to one of the modalities.<br />

As it is not clear how to appropriately weight each modality a priori, we forced that each subvector had<br />

unitary norm so as to avoid any bias.<br />

227

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!