29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 2. Cluster ensembles and consensus clustering<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

1<br />

2<br />

3<br />

−0.2<br />

−0.2 0 0.2 0.4 0.6<br />

Figure 2.1: Scatterplot of an artificially generated two-dimensional toy data set containing<br />

n = 9 objects grouped into k = 3 natural clusters. Each object is identified by a numerical<br />

label.<br />

labeling is an integer label identifying to which of the k clusters each of the n objects in<br />

the data set is assigned to.<br />

For illustration purposes, and resorting to the toy clustering example presented in section<br />

1.2.1, equation (2.2) presents a hard cluster ensemble created by compiling the outcomes of<br />

l = 3 independent runs of the k-means clustering algorithm on the two-dimensional data set<br />

presented in figure 2.1, which contains n = 9 objects, setting the desired number of clusters<br />

k equal to 3.<br />

4<br />

7<br />

5<br />

9<br />

6<br />

8<br />

⎛<br />

1 1 1 3 3 3 2 2<br />

⎞<br />

2<br />

E = ⎝2<br />

2 2 1 1 1 3 3 3⎠<br />

(2.2)<br />

2 2 2 3 3 3 1 1 1<br />

On its part, a soft cluster ensemble E is defined as the compilation of the outcomes of l<br />

fuzzy clustering processes, and as such, it is mathematically expressed as a kl×n matrix, as<br />

presented in equation (2.3) (Punera and Ghosh, 2007; Sevillano, Alías, and Socoró, 2007b).<br />

⎛ ⎞<br />

Λ1<br />

⎜<br />

⎜Λ2<br />

⎟<br />

E = ⎜ ⎟<br />

⎝ . ⎠<br />

Λl<br />

(2.3)<br />

where Λi is the k × n real-valued clustering matrix resulting from the ith soft clustering<br />

process (∀i ∈ [1,...,l]).<br />

Continuing with the same toy example, equation (2.4) presents a soft cluster ensemble<br />

created by collecting the outcomes of l = 3 independent executions of the fuzzy c-means<br />

clustering algorithm on the same data set as before. As k = 3, the first three rows of<br />

E correspond to the clustering probability membership matrix output by the first soft<br />

clustering process, the next three are the outcome of the second fuzzy clusterer, and so on.<br />

28

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!