29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.5. Discussion<br />

separate clustering processes on each type of features (employing well-established clustering<br />

algorithms designed to that end), and subsequently combining the resulting clustering<br />

solutions by means of consensus functions. Thus, this divide and conquer consensus clustering<br />

proposal is aimed to deal with objects composed of multi-type features, rather than<br />

a means for reducing the overall time complexity of consensus processes.<br />

In this chapter, two versions of hierarchical consensus architectures have been proposed.<br />

In each one of them, one of the two factors that define the topology of the architecture are<br />

prefixed, i.e. the number of stages (in deterministic HCA) or the size of the mini-ensembles<br />

(in random hierarchical consensus architectures). Structuring the whole consensus clustering<br />

task as a set of partial consensus processes that take place in successive stages gives<br />

the user the chance to apply different consensus functions across the hierarchy —a possibility<br />

that, to our knowledge, remains unexplored. Moreover, the decomposition of a classic<br />

one-step problem into a set of smaller instances of the same problem naturally allows its parallelization<br />

—provided that sufficient computational resources are available. At this point,<br />

we would like to highlight the fact that, though posed in the context of the robust clustering<br />

problem, hierarchical consensus architectures are applicable to any consensus clustering<br />

task involving large cluster ensembles.<br />

From a practical perspective, we have presented a simple running time estimation<br />

methodology that, for a given consensus clustering problem, allows a fast and pretty accurate<br />

prediction of which is the computationally optimal consensus architecture. However,<br />

the reasonably good performance of the proposed methodology could be further improved<br />

by means of a more complex (probably statistical) modeling of the consensus running times<br />

which constitute the basis of the estimation.<br />

Based on these predictions, we have presented an experimental study in which the flat<br />

and the fastest hierarchical consensus architectures are, firstly, compared in terms of their<br />

execution time. Such comparison has taken into account the most and least computationally<br />

costly HCA implementations (i.e. fully serial and parallel), so as to provide a notion of the<br />

upper and lower bounds of the time complexity of hierarchical consensus architectures.<br />

One of the most expectable conclusions drawn from the conducted experiments is that the<br />

computational optimality of a given consensus architecture is local to the consensus function<br />

F employed for combining the clusterings. In particular, as far as the execution time of<br />

hierarchical consensus architectures is concerned, the main issue to take into account is<br />

the dependence between the time complexity of F and the size of the mini-ensembles upon<br />

which consensus is conducted. For instance, the use of consensus functions the complexity<br />

of which scales quadratically with the number of clusterings consensus is created upon (e.g.<br />

MCLA) clearly favours hierarchical consensus architectures. In contrast, flat consensus<br />

is more efficient than the fastest serial hierarchical consensus architectures even in high<br />

diversity scenarios when consensus functions such as EAC are employed.<br />

Besides analyzing their computational aspects, we have also compared hierarchical and<br />

flat consensus architectures in terms of the quality of the consensus clustering solutions<br />

they yield. In this sense, inter-consensus architecture variability is highly dependent on the<br />

characteristics of the cluster ensemble and the consensus function employed. For instance,<br />

hierarchical and flat consensus architectures based on the CSPA, EAC, ALSAD and SLSAD<br />

consensus functions yield pretty similar quality consensus clusterings, whereas greater variances<br />

are observed when the remaining consensus functions are used. Moreover, in general<br />

terms, we have observed that consensus architectures based on EAC and HGPA typically<br />

106

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!