29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 6. Voting based consensus functions for soft cluster ensembles<br />

However, if object to cluster centroid distances are inverted and normalized using a<br />

softmax normalization, so they can be interpreted as membership probabilities are<br />

obtained (i.e. the k-means clustering solutions are fuzzified). For the sake of greater<br />

algorithmic diversity, variants of k-means using the Euclidean, city block, cosine and<br />

correlation distance measures have been employed. Thus, the cardinality of the algorithmic<br />

diversity factor is |dfA| = 5. Applying all these clustering algorithms on each<br />

and every one of the distinct object representations created by the mutually crossed<br />

application of the representational and dimensional diversity factors of each data set,<br />

gives rise to soft cluster ensembles of the sizes l presented in table 6.1. In order to obtain<br />

a representative analysis of the aforementioned consensus functions performance,<br />

we have conducted multiple experiments on distinct diversity scenarios. To do so,<br />

besides using the cluster ensemble of size l, we have also generated cluster ensembles<br />

of sizes ⌊ l l l<br />

l<br />

20⌋, ⌊ 10⌋, ⌊ 5⌋ and ⌊ 2⌋, which are created by randomly picking a subset<br />

of the original cluster ensemble components. For each distinct cluster ensemble, ten<br />

independent runs of each consensus function are executed.<br />

– How are results presented? The performances of the nine soft consensus functions<br />

are summarized by means of a quality (φ (NMI) with respect to the ground truth) versus<br />

time complexity (CPU time measured in seconds) diagram that describes, in a pretty<br />

summarized manner, the qualities of the consensus functions compared. For each<br />

consensus function, the depicted scatterplot corresponds to the region limited by the<br />

mean ± 2-standard deviation curves corresponding to the two associated magnitudes<br />

(i.e. φ (NMI) and CPU time) computed throughout all the experiments conducted<br />

on each data collection —ten independent experiment runs on each one of the five<br />

cluster ensemble sizes. In order to determine whether the differences between the<br />

compared consensus functions are significant or not, we have conducted a pairwise<br />

comparison (both in CPU time and φ (NMI) terms) among them applying a t-paired<br />

test, measuring the significance level p at which the null hypothesis (equal means with<br />

possibly unequal variances) is rejected. If the typical 95% confidence interval for true<br />

difference in means is taken as a reference, significance level values of p

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!