29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 4<br />

Self-refining consensus<br />

architectures<br />

As described in chapter 3, our proposal for building clustering systems robust to the inherent<br />

indeterminacies that affect the clustering problem consists of i) creating a cluster ensemble<br />

E composed of a large number of individual partitions generated by the use of as many<br />

diversity factors (e.g. clustering algorithms, object representations, etc.) as possible, ii)<br />

deriving a unique clustering solution λc upon that cluster ensemble through the application<br />

of a consensus clustering process.<br />

As mentioned earlier, the use of such a large cluster ensemble entails two negative consequences.<br />

The first one refers to the fact that the construction of the consensus clustering<br />

solution can become costly or even unfeasible, as the space and time complexity of consensus<br />

functions scales up linearly or even quadratically with the size of the cluster ensemble.<br />

In order to overcome such difficulty, in chapter 3 we put forward the concept of hierarchical<br />

consensus architectures, which are based on applying a divide-and-conquer approach<br />

to consensus clustering. Moreover, by means of a simple running time estimation methodology,<br />

the user is capable of deciding apriori, with a notable degree of accuracy, which<br />

is the most computationally efficient consensus architecture for solving a given consensus<br />

clustering problem.<br />

The other main downside to the use of large cluster ensembles is the negative bias induced<br />

on the quality of the consensus clustering solution λc by the expectable presence<br />

of poor1 individual clusterings in E, caused by the somewhat indiscriminate generation of<br />

cluster ensemble components that our proposal indirectly encourages. In order to overcome<br />

this inconvenience, we propose a simple consensus self-refining process that, in a fully unsupervised<br />

manner, allows to improve the quality of the derived consensus clustering solution<br />

λc. Moreover, an additional benefit derived from this automatic consensus refining procedure<br />

is the uniformization of the quality of the consensus clustering solutions yielded by<br />

distinct consensus architectures, which allows selecting the most appropriate one based on<br />

1 By good quality clustering solutions we refer to those partitions that reflect the true group structure<br />

of the data. Provided that we evaluate our clustering results by means of an external cluster validity index<br />

–normalized mutual information (φ (NMI) ) with respect to the ground truth, i.e. an allegedly correct group<br />

structure of the data–, the highest quality clustering results will be those attaining a φ (NMI) close to 1,<br />

whereas the φ (NMI) values associated to poor quality partitions will tend to zero, as φ (NMI) ∈ [0, 1] by<br />

definition (Strehl and Ghosh, 2002).<br />

109

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!