29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

C.4. Computationally optimal RHCA, DHCA and flat consensus comparison<br />

C.4 Computationally optimal RHCA, DHCA and flat consensus<br />

comparison<br />

In this section, we compare those random and deterministic hierarchical consensus architectures<br />

deemed to be computationally optimal against classic flat consensus in terms of two<br />

factors: i) their execution times, and ii) the quality of the consensus clustering solutions<br />

they yield. This twofold comparison is intended to determine under which conditions any<br />

the aforementioned consensus architectures outperforms the others, not only in terms of<br />

their computational efficiency, but also as far as their perfomance for the construction of<br />

robust clustering systems is concerned.<br />

This comparison has been conducted across the following eleven unimodal data collection:<br />

Iris, Wine, Glass, Ionosphere, WDBC, Balance, MFeat, miniNG, Segmentation, BBC<br />

and PenDigits. For each data set, ten independent experiments have been conducted on four<br />

diversity scenarios. Each diversity scenario is characterized by the use of cluster ensembles<br />

generated by applying a certain number of clustering algorithms |dfA| = {1, 10, 19, or 28}.<br />

In each experiment, the CPU time required for executing either the whole hierarchical consensus<br />

architecture or flat consensus is measured, and the quality of the consensus clustering<br />

solution is evaluated in terms of its normalized mutual information φ (NMI) with respect the<br />

ground truth.<br />

From a visualization perspective, both the execution times and the φ (NMI) values are<br />

presented by means of their respective boxplots —each of which comprises the ten independent<br />

experiments conducted on each diversity scenario for each data collection. When<br />

comparing boxplots, notice that non-overlapping boxes notches indicate that the medians<br />

of the compared running times differ at the 5% significance level, which allows a quick<br />

inference of the statistical significance of the results.<br />

C.4.1 Iris data set<br />

In this section, the running time and consensus quality comparison experiments are conducted<br />

on the Iris data collection. The four diversity scenarios correspond to cluster ensembles<br />

of sizes l =9, 90, 171 and 252.<br />

Running time comparison<br />

Figure C.29 presents the running times of the allegedly computationally optimal RHCA<br />

and DHCA variants (considering their serial implementation) and flat consensus. Due to<br />

the relatively small cluster ensembles on this data set, it can be observed that flat consensus<br />

is the fastest option in most cases, regardless of the diversity scenario and the consensus<br />

function employed. The only exceptions occur when consensus are built using the MCLA<br />

consensus function in the two highest diversity scenarios –in these cases, DHCA turns out to<br />

be the most efficient consensus architecture–, as this is the only consensus function the time<br />

complexity of which grows quadratically with the size of the cluster ensemble l. Amongthe<br />

hierarchical consensus architectures, DHCA tends to outperform RHCA in computational<br />

terms, except when the ALSAD consensus function is employed (although the differences<br />

between RHCA and DHCA are in general minor).<br />

The execution times corresponding to flat consensus and the entirely parallel implemen-<br />

290

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!