29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.4. Flat vs. hierarchical consensus<br />

and allegedly correct cluster structure (or ground truth), measuring their degree<br />

of resemblance in terms of normalized mutual information, φ (NMI) —recall that<br />

φ (NMI) ∈ [0, 1] and the higher its value, the better the quality of the consensus<br />

clustering solution. We measure the percentage of experiments and the relative<br />

φ (NMI) differences between the consensus clusterings and the cluster ensemble<br />

components of maximum and median φ (NMI) score.<br />

ii) We compare the φ (NMI) scores of the consensus clusterings obtained by the three<br />

consensus architectures subject to evaluation.<br />

– How are the experiments designed? The experimental methodology followed is<br />

the same as when the computational efficiency of consensus architectures was analyzed<br />

in the previous sections. That is, the consensus quality comparison has been<br />

conducted on the four diversity scenarios described in appendix A.4. In each diversity<br />

scenario, ten independent experiments have been conducted using the seven consensus<br />

functions for hard cluster ensembles employed in this work (CSPA, EAC, HGPA,<br />

MCLA, ALSAD, KMSAD and SLSAD). From a formal viewpoint, the φ (NMI) values<br />

of the consensus clustering solutions obtained in the 10 experiments corresponding to<br />

each consensus function and diversity scenario are presented.<br />

– How are results presented? In formal terms, the measured φ (NMI) values are<br />

presented by means of boxplot charts. By doing so, we can see the quality scatter<br />

of each consensus function and architecture. Again, non-overlapping boxes notches<br />

indicate that the medians of the compared φ (NMI) differ at the 5% significance level.<br />

– Which data sets are employed? In this section, we present in detail the results<br />

obtained on the Zoo data collection —for the sake of brevity, the results obtained in<br />

the remaining eleven unimodal data sets are deferred to appendix C.4. However, at<br />

the end of this section, the φ (NMI) scores of the three compared consensus architectures<br />

measured across the experiments conducted on the twelve unimodal data collections<br />

employed in this work are compiled and compared. The goal of such comparison is<br />

to analyze whether any of the consensus architecture tends to yield better consensus<br />

clustering solutions than the rest.<br />

One last before proceeding: notice that the only differences between serial and parallel<br />

hierarchical consensus architectures refer to their time complexity, not the quality of the<br />

consensus clustering solutions they yield. For this reason, the distinction between serial and<br />

parallel architectures is not found in this section.<br />

Diversity scenario |df A| =1<br />

Firstly, the φ (NMI) values of the estimated optimal serial and parallel DHCA and RHCA<br />

implementations and flat consensus architectures in the lowest diversity scenario are presented<br />

in figure 3.21. Each chart presents four boxes that, from left to right, represent the<br />

φ (NMI) values of the components of the cluster ensemble E, and of the consensus clustering<br />

solutions output by the RHCA, DHCA and flat consensus architectures, respectively. It<br />

can observed that the three consensus architectures, yield, in general, pretty similar quality<br />

consensus solutions (in fact, the differences between them are statistically non significant<br />

at the 5% level for the CSPA, MCLA, ALSAD and KMSAD consensus functions). The<br />

98

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!