29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

B.1.13 Summary<br />

Appendix B. Experiments on clustering indeterminacies<br />

So as to provide a summarized vision of the data representation and the clustering algorithm<br />

selection indeterminacies across all the analyzed data sets, table B.2 presents the φ (NMI)<br />

corresponding to the best clustering solution achieved by each one of the five families of<br />

clustering algorithms employed in this work, namely agglomerative (agglo), biased agglomerative<br />

(bagglo), direct, graph, repeated-bisecting (rb) and refined repeated-bisecting (rbr),<br />

indicating the type of object representation employed in each case (either baseline, PCA,<br />

ICA, NMF or RP).<br />

There are several worth observing facts as regards the data representation indeterminacy.<br />

Notice that, in some data sets (e.g. Zoo or miniNG), there exists a notable diversity as<br />

regards the type of representation that yields the top clustering result for each family of<br />

clustering algorithms. In contrast, in other data collections, there seems to exist a particular<br />

object representation that apparently reveals the data set structure regardless of the type of<br />

clustering algorithm applied. This behaviour is observed in the Iris and Balance collections,<br />

and also, to a lesser extent, in the WDBC and Segmentation data sets. Moreover, notice<br />

the variablility of these optimal object representations across the analyzed data sets, which<br />

is a clear indicator of the clustering indeterminacy regarding data representations.<br />

As far as the selection of the optimal clustering algorithm is concerned, it is important<br />

to note that at least one representative of the five families of clustering algorithms employed<br />

in this work reach the best absolute performance in at least one of the analyzed data sets,<br />

which gives an idea of the algorithm selection indeterminacy. Moreover, notice that choosing<br />

the wrong type of clustering algorithm may affect the quality of the clustering solution<br />

dramatically (see the Ionosphere and Balance collections) or not (as in the Segmentation<br />

data set).<br />

B.2 Clustering indeterminacies in multimodal data sets<br />

The goal of this section is to evaluate the effect of clustering indeterminacies in the context<br />

of multimodal data collections. Along with the data representation and clustering algorithm<br />

selection indeterminacies, multimodality introduces a further focus of uncertainty, as it is<br />

not evident to decide whether the combination of the m modalities will benefit the quality<br />

of the obtained clustering solution or not. And again, to make things worse, it is important<br />

to recall that all these indeterminacies are local to each data collection, so, in general, it is<br />

not possible to drawn universally valid conclusions.<br />

As done in appendix B.1, we start by presenting the total number of individual clustering<br />

solutions obtained by applying the 28 clustering algorithms extracted from the CLUTO<br />

toolbox on all the data representations of the objects contained in the employed multimodal<br />

data sets2 —see table B.3. Notice that the CAL500 and InternetAds collections lack the<br />

NMF representations as they do not satisfy the necessary non-negativity constraints.<br />

In the next paragraphs, we describe the clustering results obtained on the four multimodal<br />

data sets, placing special emphasis on which data representations and modalities<br />

lead to the best clustering results in each case.<br />

2 See appendices A.1, A.2.2, and A.3.2 for a description of the clustering algorithms, the multimodal<br />

collections and the multimodal objects representations employed in this work.<br />

241

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!