29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 5. Multimedia clustering based on cluster ensembles<br />

Moreover, suppose that our multimedia data collection is composed of m modalities.<br />

That is, each of the n objects is simultaneously represented by m disjoint sets of real-valued<br />

attributes –each one of which corresponds to one of the m modalities– of sizes d1, d2, ...,<br />

dm, sothatd1 + d2 + ... + dm = d. Thus, the multimodal data set matrix X can be<br />

decomposed in m submatrices X1, X2, ..., Xm. Each one of these matrices Xi (of size<br />

di × n, ∀i ∈ [1,m]) represents all the objects in the data set according to each one of the<br />

m modalities it contains —see figure 5.1.<br />

Given this scenario, a first subset of the clusterings that will constitute the multimodal<br />

cluster ensemble E are generated through the application, upon each 1 submatrix<br />

Xi, (∀i ∈ [1,n]), of f mutually crossed diversity factors dfj, ∀j ∈ [1,f]. If the same set of<br />

diversity factors is applied on the m modalities, the number of clusterings generated in this<br />

first subset is equal to:<br />

l1 mod = m|df1||df2| ...|dff| (5.1)<br />

where the |·|operator denotes the cardinality of a set.<br />

Secondly, another subset of clusterings is created by the application of a set of diversity<br />

factors (not necessarily equal to the previous one) upon a fused multimodal representation<br />

of the data set. This representation can be generated by means of any early feature fusion<br />

process, such as the application of a projection-based object representation technique on<br />

the d-dimensional vectors resulting from the concatenation of the features corresponding<br />

to the m modalities (<strong>La</strong> Cascia, Sethi, and Sclaroff, 1998; Zhao and Grosky, 2002; Benitez<br />

and Chang, 2002; Snoek, Worring, and Smeulders, 2005; Gunes and Piccardi, 2005). This<br />

second subset of clusterings will be referred to using the symbol m mod, astheyareobtained<br />

upon an object representation that combines the m modalities into a single one.<br />

Assuming, for simplicity, that the same set of diversity factors are employed for creating<br />

the subsets of unimodal and multimodal clusterings, the number of multimodal partitions<br />

created becomes:<br />

lm mod = |df1||df2| ...|dff | (5.2)<br />

Finally, the mere compilation of the unimodal and multimodal partitions constitute the<br />

multimedia cluster ensemble E, the size of which is equal to:<br />

l = l1 mod + lm mod =(m +1)|df1||df2| ...|dff | (5.3)<br />

As regards the creation of the multimodal cluster ensembles E on the four multimedia<br />

data collections employed in this work (see appendix A.2.2 for a description), three diversity<br />

factors have been applied: clustering algorithms (dfA), object representations (dfR)<br />

and object representation dimensionalities (dfD). In the following paragraphs, a detailed<br />

description of these diversity factors and their role in the cluster ensemble creation process<br />

are presented.<br />

Starting with the original object features, which constitute the baseline representation,<br />

additional object representations are derived by means of feature extraction based on<br />

1 As these clusterings are created by running multiple clustering processes separately on each modality,<br />

we refer to them using the symbol 1 mod, which stands for “one modality”.<br />

135

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!