29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.4. Flat vs. hierarchical consensus<br />

results obtained when the MCLA consensus function is employed as the basis of consensus<br />

architectures must be made with care, as flat consensus is not executable when it must<br />

be conducted on large cluster ensembles. For this reason, the boxplots presented in the<br />

MCLA column only reflect the φ (NMI) values corresponding to those experiments where the<br />

three consensus architectures are executable. In these cases, the best consensus clustering<br />

solutions are obtained by the flat and DHCA consensus architectures. A greater degree of<br />

evenness between consensus architectures is observed in the consensus functions that treat<br />

object similarity as object features, i.e. ALSAD, KMSAD and SLSAD —notice the large<br />

overlap between boxes. However, whereas DHCA yields slightly lower quality consensus<br />

clustering solutions than the RHCA and flat consensus architectures when the ALSAD and<br />

KMSAD consensus functions are employed, it is the flat consensus approach that attains<br />

the lowest φ (NMI) values among the SLSAD based consensus architectures.<br />

And secondly, if an inter-consensus functions comparison is conducted, we can conclude<br />

that the excellent performance of the EAC consensus function on the Zoo data collection<br />

apparently constitutes an exception to the rule, as –together with HGPA and SLSAD–<br />

it yields the lowest φ (NMI) values (i.e. the poorest consensus clustering solutions) when<br />

the results obtained across all the data sets and diversity scenarios are compiled. In contrast,<br />

CSPA, MCLA, ALSAD and KMSAD tend to yield comparatively better consensus<br />

clustering solutions in a global perspective.<br />

Following a more quantitative perspective, we have compared the quality of the consensus<br />

clustering solutions yielded by the three consensus architectures with the components<br />

of the cluster ensemble consensus is conducted upon. In particular, this comparison has<br />

taken into account the cluster ensemble components of median and maximum φ (NMI) with<br />

respect to the ground truth (referred to as the median ensemble component, orMEC,and<br />

best ensemble component, or BEC, respectively). This comparison makes sense inasmuch we<br />

focus the application of consensus clustering as a means for becoming robust to the inherent<br />

indeterminacies that affect the clustering problem. More specifically, the higher the φ (NMI)<br />

of the consensus clustering solution with respect to that of the cluster ensemble components,<br />

the higher robustness is achieved. The median and maximum φ (NMI) components are used<br />

as a summarized reference of the quality of the cluster ensemble contents.<br />

For this reason, we have evaluated i) the percentage of experiments in which the consensus<br />

clustering solution attains a higher φ (NMI) than the MEC and the BEC, and ii)<br />

the relative percentage φ (NMI) variation between the median and the best cluster ensemble<br />

components and the consensus clustering solution.<br />

As regards the first issue, table 3.12 presents the percentage of experiments (considering<br />

all data sets in the highest diversity scenario) where the consensus clustering solution is better<br />

than the median normalized mutual information cluster ensemble component (MEC). It<br />

can be observed that the average percentage of experiments (considering all data sets in the<br />

highest diversity scenario) where the consensus clustering solution is better than the MEC<br />

is 53.1%, which indicates than in more than the half of the experiments, consensus yields a<br />

clustering solution better than the one located halfway of the cluster ensemble components.<br />

When the relative percentage φ (NMI) gains between consensus clustering solutions and the<br />

MEC are computed, a reasonable average of 59% gain is obtained —see table 3.13 for a<br />

detailed presentation of the results per consensus function and consensus architecture.<br />

If such comparison is referred to the cluster ensemble component that best describes the<br />

group structure of the data in terms of normalized mutual information with respect to the<br />

102

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!