29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

C.4. Computationally optimal RHCA, DHCA and flat consensus comparison<br />

Consensus quality comparison<br />

The φ (NMI) values of the consensus clustering solutions yielded by flat, random hierarchical<br />

and deterministic hierarhical consensus architectures follows a pattern that is quite similar<br />

to what has been observed in the previous data collections, at least as far as the performance<br />

of the distinct consensus functions is concerned. That is, the lowest quality consensus solutions<br />

are obtained by means of the EAC and HGPA consensus functions, whereas ALSAD<br />

tends to yield the best results.<br />

C.4.10 BBC data set<br />

In this section, the running time and consensus quality comparison experiments are conducted<br />

on the BBC data collection. The four diversity scenarios correspond to cluster<br />

ensembles of sizes l =57, 570, 1083 and 1596.<br />

Running time comparison<br />

As far as the running times of the entirely serial implementation of RHCA and DHCA<br />

and of flat consensus are concerned, the boxplots depicted in figure C.56 show that flat<br />

consensus constitutes the most computationally competitive consensus architecture in most<br />

cases —in fact, the only exceptions occur when the HGPA and MCLA consensus functions<br />

are employed.<br />

When the parallel implementation of hierarchical consensus architecture is considered,<br />

they become more competitive (in computational terms), reverting the situation observed<br />

in the serial case for the CSPA and KMSAD consensus functions —see figure C.57.<br />

Consensus quality comparison<br />

As regards the quality of the consensus clustering solutions yielded by the three consensus<br />

architectures (measured as the φ (NMI) with respect to the ground truth that defines the<br />

true group structure of the BBC data collection), we can observe great differences between<br />

the performance of the distinct consensus functions –see figure C.58–: while the MCLA,<br />

ALSAD and KMSAD clustering combiners tend to yield consensus clusterings of quality<br />

comparable to the best components of the cluster ensemble, the clustering solutions output<br />

by consensus architectures based on EAC, HGPA and SLSAD are notably poorer.<br />

C.4.11 PenDigits data set<br />

This section presents the execution times of the computationally optimal RHCA, DHCA<br />

and flat consensus architecture and the φ (NMI) values of the consensus clustering solutions<br />

yielded by them on the PenDigits data collection. The presented results consider the<br />

experiments conducted across four diversity scenarios, and the cluster ensemble sizes corresponding<br />

to them are l =57, 570, 1083 and 1596, respectively. Due to the number of objects<br />

contained in this data set, only the HGPA and MCLA consensus functions are executable<br />

on it, as they are the only ones the space complexity of which scales linearly with this<br />

324

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!