29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 2<br />

Cluster ensembles and consensus<br />

clustering<br />

In our quest for overcoming clustering indeterminacies in a multimodal context, the notions<br />

of cluster ensembles and consensus clustering play a central role. As mentioned at the<br />

end of chapter 1, our strategy for clustering multimodal data in a robust manner is based<br />

on the massive creation of multiple partitions of the target data set and the subsequent<br />

combination of these into a single consensus clustering solution. Therefore, an appropriate<br />

way to start this chapter is by formally defining the two closely related concepts of cluster<br />

ensembles and consensus clustering1 .<br />

For starters, a cluster ensemble E is defined as the compilation of the outcomes of l<br />

clustering processes. For simplicity, we assume in this work that the l clustering processes<br />

group the data into the same number of clusters, namely k, although this is not a strictly<br />

necessary constraint2 . Depending on whether the clustering processes are crisp or fuzzy, E<br />

will be a hard or a soft cluster ensemble.<br />

In the former case, E is mathematically defined as a l×n integer-valued matrix compiling<br />

l row label vectors λi (∀i ∈ [1,...,l]) resulting from the respective hard clustering processes<br />

(see equation (2.1)).<br />

⎛<br />

λ1<br />

λl<br />

⎞<br />

⎛<br />

⎜<br />

⎜λ2<br />

⎟<br />

E = ⎜ ⎟<br />

⎝ . ⎠ =<br />

⎜<br />

⎝ .<br />

λ11 λ12 ... λ1m<br />

λ21 λ22 ... λ2m<br />

. ..<br />

λl1 λl2 ... λlm<br />

⎞<br />

⎟<br />

⎠<br />

(2.1)<br />

where λij ∈{1,...,k} (∀i ∈ [1,...,l], and ∀j ∈ [1,...,n]), i.e. each component of each<br />

1 In some works, the term ‘cluster ensemble’ is used to designate the framework for combining multiple<br />

partitionings obtained from separate clustering runs into a final consensus clustering (Strehl and Ghosh,<br />

2002; Punera and Ghosh, 2007). In this work, however, we stick to the literal meaning of this expression,<br />

and use it to design the result of gathering several clustering solutions.<br />

2 Since our goal is to combine partitions differing only in the way data are represented and clustered, we<br />

set the number of clusters k to be equal across the l clustering processes. However, combining clustering<br />

solutions with a variable number of clusters is a common practice in the cluster ensembles literature. This<br />

can be useful for clustering complex data sets upon simple individual partitions (Fred and Jain, 2005), or<br />

for discovering the natural number of clusters in the data set (Strehl and Ghosh, 2002), although these<br />

potentialities are not exploited in this work.<br />

27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!