29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 7. Conclusions<br />

ing upon the aforementioned cluster ensemble, which gives rise to the first two proposals<br />

put forward in this thesis: hierarchical consensus architectures and consensus self-refining<br />

procedures, which are reviewed in sections 7.1 and 7.2, respectively.<br />

Our proposals for robust clustering based on cluster ensembles find a natural field of<br />

application in multimedia data clustering, as the existence of multiple data modalities<br />

poses additional indeterminacies that challenge the obtention of robust clustering results.<br />

Moreover, our strategy naturally allows the simultaneous use of early and late multimodal<br />

fusion techniques, which constitutes a highly generic approach to the problem of multimedia<br />

clustering. In section 7.3, our proposals in this area are reviewed.<br />

The last proposal of this thesis can be regarded as a first step for generalizing our cluster<br />

ensembles based robust clustering approach, as it consists of several voting based consensus<br />

functions for soft cluster ensembles —recall that crisp clustering is in fact a simplification of<br />

its fuzzy counterpart. These consensus functions, which are reviewed in section 7.4, can also<br />

be considered a response to the relatively few efforts devoted to the derivation of consensus<br />

clustering strategies in the context of fuzzy clustering.<br />

We have given great importance to the experimental evaluation of all our proposals. To<br />

that effect, we have employed several state-of-the-art consensus functions for hard cluster<br />

ensembles –hypergraph based (CSPA, HGPA, MCLA) (Strehl and Ghosh, 2002), evidence<br />

accumulation (EAC) (Fred and Jain, 2005) and similarity-as-data based (ALSAD, KMSAD,<br />

SLSAD) (Kuncheva, Hadjitodorov, and Todorova, 2006)– to implement our self-refining<br />

hierarchical consensus architectures. Moreover, the fuzzy versions of CSPA, EAC, HGPA<br />

and MCLA, plus the VMA soft consensus function (Dimitriadou, Weingessel, and Hornik,<br />

2002) have been used as an evaluation benchmark for our voting based consensus functions<br />

for soft cluster ensembles. Our proposals have been tested over a total of sixteen unimodal<br />

and multimodal data collections, which contain a number of objects ranging from hundreds<br />

to several thousands. In particular, the performance of self-refining hierarchical consensus<br />

architectures has been evaluated on both unimodal (chapters 3 and 4) and multimodal data<br />

collections (chapter 5), whereas the experiments concerning soft consensus functions have<br />

been conducted on the 12 unimodal collections —see chapter 6. However, in the near future,<br />

we plan extending these latter experiments towards multimodal data sets. We expect such<br />

extension to be little costly, since any consensus function can easily accommodate our early<br />

plus late fusion multimedia clustering proposal. In this sense, we also intend to apply<br />

our multimedia clustering system on well-known multimodal data sets such as VideoClef<br />

(composed of video data along with speech recognition transcripts, metadata and shot-level<br />

keyframes) (VideoCLEF, accessed on May 2009) and ImageClef (still images annotated with<br />

text) (ImageCLEF, accessed on May 2009).<br />

Furthermore, we have conducted our experiments on cluster ensembles of very different<br />

sizes (from 6 to 5124 clusterings), in order to evaluate the influence of this factor on the<br />

computational facet of our proposals. In all the experiments, the statistical significance<br />

of the results at the 5% significance level has been evaluated, either explicitly or by their<br />

presentation by means of boxplot charts.<br />

As mentioned in appendix A.6, the experiments conducted in this thesis have been<br />

run under Matlab 7.0.4 on Dual Pentium 4 3GHz/1 GB RAM computers. A total of<br />

20 computers were employed during approximately 9 months at an almost constant pace,<br />

combining for an estimated total running time of more than 10 years!<br />

These experimental variablilty has provided us with a comparative view of the state-<br />

194

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!