29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.5 Motivation and contributions of the thesis<br />

Chapter 1. Framework of the thesis<br />

The main motivation of this thesis is the construction of an efficient multimodal clustering<br />

system that performs as autonomously as possible, avoiding the re-execution of the different<br />

stages of the knowledge discovery process. As these feedback loops are caused by suboptimal<br />

decision-making, our idea is setting cluster practitioners free from the obligation of making<br />

such critical decisions in a blind way, obtaining, at the same time, clustering solutions which<br />

are robust to the clustering indeterminacies presented in the previous section.<br />

Instead of being forced to blindly select a single clustering configuration, the user is<br />

encouraged to use and combine all the data modalities, representations and clustering algorithms<br />

at hand, generating as many individual clustering solutions (compiled into a cluster<br />

ensemble) as possible. It will be the proposed system which, in a fully unsupervised mode,<br />

outputs a consensus clustering solution that will hopefully be comparable to (or even better<br />

than) the one achieved using the best clustering configuration among the available ones.<br />

As the informed reader may have guessed, the approach followed in the quest for this<br />

goal localizes in the consensus clustering framework, which is defined as “the problem of<br />

combining multiple partitionings of a set of objects into a single consolidated clustering<br />

without accessing the features or algorithms that determined these partitionings” (Strehl<br />

and Ghosh, 2002). That is, the data representations, modalities and clustering algorithms<br />

employed for generating the individual partitions are not of the system’s concern, as it will<br />

operate on the individual clustering solutions regardless of the way they were created.<br />

However, applying consensus clustering on cluster ensembles as a means for obtaining<br />

robust clustering solutions is not new—in fact it has been a central or collateral matter in<br />

several works (Strehl and Ghosh, 2002; Fred and Jain, 2003; Sevillano et al., 2006a; Fern<br />

and Lin, 2008). Anyway, this thesis deals with several crucial and, to our knowledge, little<br />

addressed issues in this context, such as:<br />

– the computational burden imposed by the use of large cluster ensembles generated by<br />

crossing multiple data modalities, representations and clustering algorithms.<br />

– the quality decrease of the consensus clustering solution caused by the wide diversity<br />

of the cluster ensemble.<br />

– the application of cluster ensembles on the multimodal clustering problem.<br />

– the definition of methods for building consensus clustering solutions (either crisp or<br />

fuzzy) from the outputs of soft clustering algorithms.<br />

As a systematic response to these challenges, this thesis puts forward the following<br />

proposals:<br />

– parallelizable hierarchical consensus architectures for creating consensus clustering<br />

solutions in a computationally efficient way (see chapter 3).<br />

– fully unsupervised consensus self-refining procedures, so as to drive the quality of<br />

the consensus clustering solution near or even above the best available individual<br />

clustering configuration —see chapter 4.<br />

25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!