29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

C.4. Computationally optimal RHCA, DHCA and flat consensus comparison<br />

C.4.5 WDBC data set<br />

In this section, we present the running times and consensus clustering solution qualities<br />

of the hierarchical and flat consensus architectures corresponding to the WDBC data collection.<br />

In this case, each diversity scenario corresponds to a cluster ensemble of size<br />

l = 113, 1130, 2147 and 3164, respectively.<br />

Running time comparison<br />

Figure C.41 presents the running time of flat consensus and of the serial implementation<br />

of the fastest random and hierarchical consensus architectures across the four diversity<br />

scenarios. In this case, the relationship between the execution times of these consensus architectures<br />

is a little different from what has been observed in the previous data collections.<br />

In particular, flat consensus is more a competitive alternative, being faster or almost as fast<br />

as RHCA in all diversity scenarios when consensus is based on the CSPA, EAC, ALSAD<br />

and SLSAD clustering combiners. In contrast, DHCA is notably slower than RHCA in most<br />

cases. This is due to the large cardinality of the dimensional diversity factor (|dfD|=28 on<br />

this data set) that makes the DHCA stage where consensus is conducted on this diversity<br />

factor much more computationally costly compared to the intermediate consensus processes<br />

of RHCA.<br />

In figure C.42, the execution times of the computationally optimal parallel RHCA and<br />

DHCA variants and flat consensus are presented. Two trends are observed in these boxplots:<br />

firstly, hierarchical architectures are faster than flat consensus, especially in diversity<br />

scenarios where large cluster ensembles are employed. And secondly, parallel DHCA are in<br />

general slower than their RHCA counterparts, for the same reason stated before.<br />

Consensus quality comparison<br />

Figure C.43 presents the φ (NMI) of the consensus clustering solutions yielded by the RHCA,<br />

DHCA and flat consensus architectures across the four diversity scenarios on the WDBC<br />

data collection. Firstly, notice that the EAC and SLSAD consensus functions give rise to<br />

very low quality consensus clusterings regardless of the consensus architecture employed.<br />

In contrast, flat consensus yields reasonably good consensus clusterings when it is derived<br />

by means of HGPA, although hierarchical consensus architectures based on this consensus<br />

function output poor consensus clustering solutions. Meanwhile, the remaining clustering<br />

combiners yield pretty good consensus —notice that slightly better results are obtained<br />

when it is derived by means of RHCA and flat consensus architectures.<br />

C.4.6 Balance data set<br />

This section presents the execution times of the estimated computationally optimal serial<br />

and parallel implementations of RHCA and DHCA and flat consensus in the four<br />

diversity scenarios for the Balance data set, which give rise to cluster ensembles of sizes<br />

l =7, 70, 133 and 196 each. Moreover, the quality of the consensus clustering solutions<br />

output by each consensus architecture are evaluated in terms of the normalized mutual<br />

information (φ (NMI) ) with respect to the ground truth.<br />

306

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!