29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.2. Flat vs. hierarchical self-refining<br />

Consensus Consensus function<br />

architecture CSPA EAC HGPA MCLA ALSAD KMSAD SLSAD<br />

flat 8.3 0 0 0 25 23.9 8.3<br />

RHCA 8.3 0 0 16.7 28.3 23.8 4<br />

DHCA 16.7 0 0.1 16.6 16.7 18.2 8.3<br />

Table 4.4: Percentages of experiments in which the best (non-refined or self-refined) consensus<br />

clustering solution is better than the best cluster ensemble component, averaged across<br />

the twelve data collections.<br />

Consensus Consensus function<br />

architecture CSPA EAC HGPA MCLA ALSAD KMSAD SLSAD<br />

flat 2.7 – – – 3.5 1.1 0.1<br />

RHCA 2.5 – – 1.7 4.2 1 0.1<br />

DHCA 3.3 – 2.2 1.4 1.4 1.2 0.8<br />

Table 4.5: Relative percentage φ (NMI) gains between the best (non-refined or self-refined)<br />

consensus clustering solution and the best cluster ensemble component, averaged across the<br />

twelve data collections.<br />

all the results presented correspond to an average across all the experiments conducted on<br />

the twelve unimodal data collections.<br />

As regards the first issue, table 4.4 presents the percentage of experiments where the<br />

highest quality consensus clustering solution (either refined or non-refined) is better than the<br />

BEC (i.e. it attains a φ (NMI) that is higher than that of the cluster ensemble component that<br />

best describes the group structure of the data in terms of normalized mutual information<br />

with respect to the given ground truth). In average, this happens in a 10.6% of the conducted<br />

experiments, which is a frequency of occurence 100 times higher than what was obtained<br />

when non-refined clustering solutions were considered (see table 3.14 in chapter 3). Again,<br />

this result reveals the notable consensus improvement introduced by the proposed selfrefining<br />

procedure. Moreover, notice the poor results obtained with the EAC and the<br />

HGPA consensus functions, which were already reported to be the worst performing ones<br />

in chapter 3.<br />

Moreover, the relative percentage φ (NMI) gains between the top quality consensus clustering<br />

solution and the BEC are presented in table 4.5, attaining a modest average increase<br />

of 1.8%. However, recall that this figure was as low as 0.6% when the non-refined consensus<br />

clustering solution was considered (see table 3.15 in section 3.4), which indicates that the<br />

consensus self-refining procedure introduces again notable quality improvements.<br />

If this comparison is now referred to the median ensemble component, it can be observed<br />

that, in average, the best (non-refined or self-refined) consensus clustering solution attains<br />

a φ (NMI) that is higher than that of the cluster ensemble component that has the median<br />

normalized mutual information with respect to the given ground truth in a 67.7% of the<br />

experiments conducted (see table 4.6). Recall that this percentage was 53.1% when the<br />

consensus clustering solution prior to self-refining was compared to the MEC —see table<br />

3.12 in section 3.4.<br />

If the degree of improvement between the best (non-refined or self-refined) consensus<br />

clustering solutions that attain a higher φ (NMI) than the MEC is measured in terms of<br />

118

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!