29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 6. Voting based consensus functions for soft cluster ensembles<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

3<br />

2<br />

1<br />

−0.2<br />

−0.2 0 0.2 0.4 0.6<br />

Figure 6.1: Scatterplot of an artificially generated two-dimensional data set containing n =9<br />

objects, which are represented by coloured symbols and identified by a number. The black<br />

star symbols represent the position of the cluster centroids found by the k-means algorithm.<br />

6<br />

9<br />

4<br />

8<br />

7<br />

5<br />

λ =[222111333] (6.3)<br />

As regards the results of applying a fuzzy clustering algorithm on this data collection,<br />

these clearly differ depending on the way the degree of association between objects and<br />

clusters is codified. Usually, the scalar values λij contained in the clustering matrix Λ<br />

represent cluster membership probabilities (i.e. the higher the value of λij, the more strongly<br />

the jth object is associated to the ith cluster). For instance, this is the way the well-known<br />

fuzzy c-means (FCM) clustering algorithm codifies its clustering results (Höppner, Klawonn,<br />

and Kruse, 1999). In fact, if this algorithm is applied on the previously described artificial<br />

data set, the clustering matrix presented in equation (6.4) is obtained.<br />

⎛<br />

0.054 0.026 0.057 0.969 0.976 0.959 0.009 0.016<br />

⎞<br />

0.010<br />

Λ = ⎝0.921<br />

0.932 0.905 0.025 0.019 0.030 0.014 0.055 0.017⎠<br />

(6.4)<br />

0.025 0.042 0.038 0.006 0.005 0.011 0.976 0.929 0.972<br />

It can be observed that any row permutation in Λ would yield an equivalent fuzzy partition.<br />

Moreover, notice that Λ can be transformed into a hard clustering solution by simply<br />

assigning each object to the cluster with maximum membership probability.<br />

However, the degree of association between objects and clusters can be described in<br />

terms of other parameters, such as the distance of each object to the cluster centroids (such<br />

as k-means, that despite being a hard clustering algorithm, can output this information). In<br />

fact, the object to centroid1 distance matrix obtained after applying k-means on the same<br />

toy data set as before is presented in equation (6.5).<br />

⎛<br />

0.362 0.325 0.672 0.002 0.002 0.005 0.160 0.092<br />

⎞<br />

0.125<br />

Λ = ⎝0.010<br />

0.009 0.027 0.397 0.490 0.436 0.251 0.320 0.209⎠<br />

(6.5)<br />

0.170 0.202 0.445 0.090 0.125 0.162 0.002 0.005 0.002<br />

In this case, the conversion of Λ into a crisp partition requires assigning every object<br />

to the closest (i.e. minimum distance) cluster. Thus, depending on the nature of the soft<br />

1 The cluster centroids are represented by means of a black star symbol in figure 6.1<br />

164

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!