29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 7<br />

Conclusions<br />

The contributions put forward in this thesis constitute a unitary proposal for robust clustering<br />

based on cluster ensembles, with a specific focus on the increasingly interesting application<br />

of multimedia data clustering and a view on its generalization in fuzzy clustering<br />

scenarios. In this chapter, we summarize the main features of our proposals, highlighting<br />

their strengths and weaknesses, and outlining some interesting directions for future research.<br />

As for the robustness of clustering, recall that the unsupervised nature of this problem<br />

makes it difficult (if not impossible) to select apriorithe clustering system configuration<br />

that gives rise to the best 1 data partition. Furthermore, given the myriad of options –e.g.<br />

clustering algorithms, data representations, etc.– available to the clustering practitioner,<br />

such important decision making is marked by a high degree of uncertainty. As suboptimal<br />

configuration decisions may give rise to little meaningful partitions of the data, it turns out<br />

that these clustering indeterminacies end up being very relevant in practice, which, in our<br />

opinion, justifies research efforts oriented to overcome them (such as the present one). This<br />

was the main motivation of our first approaches to robust clustering via cluster ensembles<br />

(Sevillano et al., 2006a; Sevillano et al., 2006b; Sevillano et al., 2007c), which have attracted<br />

the attention of several researchers (Tjhi and Chen, 2007; Pinto, 2008; Gonzàlez and Turmo,<br />

2008b; Tjhi and Chen, 2009).<br />

For these reasons, our approach to robust clustering intentionally reduces user decision<br />

making as much as possible, thus following an approach that is nearly opposite to the<br />

procedure usually employed in clustering: instead of using a specific clustering configuration<br />

(which is often selected blindly unless domain knowledge is available), the clustering<br />

practitioner is encouraged to use and combine all the clustering configurations at hand,<br />

compiling the resulting clusterings into a cluster ensemble, upon which a consensus clustering<br />

is derived. The more similar this consensus clustering is to the highest quality clustering<br />

contained in the cluster ensemble, the greater the robustness to clustering indeterminacies.<br />

In this context, it must be noted that our particular approach to robust clustering foments<br />

the creation of large cluster ensembles. This motivates that one of our main issues<br />

of concern is the computationally efficient derivation of a high quality consolidated cluster-<br />

1 The best data partition is an elusive concept in itself, as it basically depends on how the clustered<br />

data is interpreted. However, for any given interpretation criterion, some clustering algorithms may obtain<br />

better clusters than others (Jain, Murty, and Flynn, 1999). In this work, the quality of clusterings has been<br />

evaluated by comparison with a allegedly correct cluster structure of the data, referred to as ground truth,<br />

measuring their degree of resemblance by means of normalized mutual information, or φ (NMI) .<br />

193

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!