29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Appendix B. Experiments on clustering indeterminacies<br />

see that φ (NMI) values scattered in a range extending approximately from φ (NMI) =0.45 to<br />

φ (NMI) =0.85 are obtained. It is important to notice that such diverse results are solely due<br />

to the clustering algorithm selection indeterminacy, as this histogram presents the results<br />

of running multiple distinct clustering algorithms on a single data representation.<br />

If this analysis is extended to the remaining histograms (figures B.1(b) to B.1(e)), it can<br />

be observed that the φ (NMI) scatter extends across an even wider range for each distinct type<br />

of representation. This somehow gives an idea of the dependence between the quality of the<br />

clustering results and the selection of the clustering algorithm. However, this conclusion<br />

cannot be drawn as directly as in the baseline representation, given that histograms B.1(b)<br />

to B.1(e) present the results of running the 28 algorithms on multiple representations with<br />

distinct dimensionalities derived by each feature extraction technique. In other words,<br />

the diversity observed in these histograms is produced by the joint effect of the clustering<br />

algorithm and dimensionality reduction data representation selection indeterminacies.<br />

However, if figures B.1(a) to B.1(e) are compared among themselves, the different histogram<br />

distributions reveal the effect of the clustering indeterminacy regarding the type of<br />

data representation. For example, clustering results on the NMF representations of this<br />

data set span across a comparatively narrower and higher range of φ (NMI) values than their<br />

PCA, ICA and RP counterparts, indicating that it is more probable to obtain better results<br />

if clustering is run on NMF representations than on the remaining ones.<br />

B.1.2 Iris data set<br />

Compared to other data sets, a pretty small number of clustering solutions have been generated<br />

on the Iris collection. Regardless of this fact, the effect of the clustering indeterminacies<br />

can also be observed in figure B.2.<br />

clustering count<br />

10<br />

5<br />

Iris Baseline<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(a) Baseline<br />

clustering count<br />

10<br />

5<br />

Iris PCA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(b) PCA<br />

clustering count<br />

10<br />

5<br />

Iris ICA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(c) ICA<br />

clustering count<br />

10<br />

5<br />

Iris NMF<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(d) NMF<br />

clustering count<br />

10<br />

5<br />

Iris RP<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(e) RP<br />

Figure B.2: Histograms of the φ (NMI) values obtained on each data representations in the<br />

Iris data set.<br />

In this case, the wide span of the φ (NMI) histograms of the PCA and ICA representations<br />

(figures B.2(b) and B.2(c)) is the clearest indicator of the representation dimensionality and<br />

algorithm selection indeterminacies.<br />

If the qualities of the clustering solutions obtained for the distinct types of object representation<br />

are compared, we can observe that the highest φ (NMI) values are obtained using<br />

the RP and the baseline representations.<br />

235

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!