29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.1. Hierarchical consensus architectures<br />

make the execution of traditional one-step (aka flat) consensus processes (in which the whole<br />

cluster ensemble is input to the consensus function at once) very costly or even unfeasible<br />

when conducted on highly populated cluster ensembles. For this reason, the application of<br />

a divide-and-conquer strategy on the cluster ensemble –which gives rise to the hierarchical<br />

consensus architectures (HCA) proposed in chapter 3– constitutes an alternative to classic<br />

flat consensus that, besides leaving out none of the l cluster ensemble components, is also<br />

naturally parallelizable, making it even more computationally appealing.<br />

In particular, two types of hierarchical consensus architectures have been proposed:<br />

random and deterministic HCA. Both architectures differ in the way the user intervenes in<br />

their design. In random HCA, the user selects the size (b) of the mini-ensembles intermediate<br />

consensus processes are conducted, which, together with the cluster ensemble size l<br />

determines the number of stages of the consensus architecture. Compared to them, deterministic<br />

HCA provide a more modular approach to consensus clustering, as clusterings of<br />

the same nature are combined at each stage of the hierarchy. In fact, our first approach to<br />

hierarchical consensus architectures dealt with deterministic HCA (Sevillano et al., 2007a),<br />

although it was solely focused on the analysis of the quality of the consensus clusterings<br />

obtained, not on its computational aspect.<br />

Extensive experiments have proven that their computational efficiency is highly dependent<br />

on the characteristics of the consensus function employed for combining the clusterings<br />

(in particular, it depends on how its time complexity scales with the number of clusterings<br />

combined). For instance, flat consensus based on the EAC consensus function is more efficient<br />

than any hierarchical architecture, whereas a rather opposite behaviour is observed<br />

when MCLA is used.<br />

Moreover, we have observed that HCAs become faster than flat consensus when operating<br />

on highly populated cluster ensembles, regardless of whether their fully serial or parallel<br />

implementation is considered (except when the EAC consensus function is employed). Expectably,<br />

the fully parallel version of HCAs outperforms flat consensus (often by a large<br />

margin), even when small cluster ensembles are employed. An additional interesting feature<br />

of hierarchical consensus architectures is that they provide a means for obtaining a<br />

consensus clustering solution in scenarios where the complexity of flat consensus makes its<br />

execution impossible (for a given set of computational resources).<br />

Given the fact that multiple specific implementation variants of a HCA exist, and that<br />

their time complexities can differ largely, it seems necessary to provide the user with tools<br />

that allow to predict, for a given consensus clustering problem, which is the most computationally<br />

efficient one. In this sense, a simple methodology for estimating their running time<br />

–and thus, selecting which is the least time consuming– has also been proposed. Despite<br />

its simplicity, the proposed methodology is capable of achieving an accuracy close to 80%<br />

when predicting the fastest serially implementated HCA variant, while this percentage goes<br />

down to nearly 50% in the parallel implementation case. This difference is caused by the<br />

fact that, in the parallel case, the running time estimation is more sensitive to random<br />

deviations of the measured running times the estimation is based upon, as it often ends up<br />

depending on a single execution time sample. However, the impact of incorrect predictions,<br />

when measured in running time overheads with respect to the truly fastest HCA variant,<br />

is well below 10 seconds in a vast majority of the experiments conducted —of course, the<br />

relative importance of such deviations will ultimately depend on the time requirements of<br />

the specific application the HCA is embedded in.<br />

196

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!