29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Appendix B<br />

Experiments on clustering<br />

indeterminacies<br />

The goal of this appendix is to present experimental evidences of the indeterminacies affecting<br />

the practical selection of a specific clustering configuration introduced in chapter 1. In<br />

particular, we focus on the indeterminacies regarding the selection of the data representation<br />

and clustering algorithm that yields the best clustering results for both unimodal and<br />

multimodal data collections.<br />

As already noted in chapter 1, the evaluation of the clustering results is based on computing<br />

the normalized mutual information φ (NMI) between a given label vector and the<br />

ground truth that is not available to the clustering process, being only used with evaluation<br />

purposes. Recall that φ (NMI) ranges from 0 to 1, the higher its value the more similar the<br />

clustering result to the ground truth.<br />

B.1 Clustering indeterminacies in unimodal data sets<br />

In this section, we analyze which clustering configurations (data representation plus clustering<br />

algorithm) give rise to the best partitioning of the unimodal data sets described in<br />

section A.2.1. We aim to demonstrate the dependence between the quality of the clustering<br />

results and the selection of the way objects are represented and clustered.<br />

As described in section A.3.1, starting with the original data representation (denoted<br />

as baseline), four additional representations have been created by applying several feature<br />

extraction techniques with multiple dimensionalities, namely Principal Component Analysis<br />

(PCA), Independent Component Analysis (ICA), Non-negative Matrix Factorization<br />

(NMF) and Random Projection (RP) 1 .<br />

On each distinct object representation, the 28 clustering algorithms from the CLUTO<br />

toolbox presented in section A.1 have been applied, which gives rise to the number of partitions<br />

per data representation presented in table B.1. Notice that, in those data sets not<br />

satisfying non-negativity constraints, the NMF representation was not derived. Moreover,<br />

1 The only exception to this rule is the MFeat data set, where no attribute transformation was applied,<br />

as its original form already presents data representation diversity through the use of several features, see<br />

section A.2.1.<br />

233

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!