29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

B.1. Clustering indeterminacies in unimodal data sets<br />

B.1.3 Wine data set<br />

The histograms of the φ (NMI) values obtained by each clustering algorithm across all the<br />

data representations employed in the Wine data set are presented in figure B.3.<br />

clustering count<br />

40<br />

30<br />

20<br />

10<br />

Wine Baseline<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(a) Baseline<br />

clustering count<br />

40<br />

30<br />

20<br />

10<br />

Wine PCA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(b) PCA<br />

clustering count<br />

40<br />

30<br />

20<br />

10<br />

Wine ICA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(c) ICA<br />

clustering count<br />

40<br />

30<br />

20<br />

10<br />

Wine NMF<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(d) NMF<br />

clustering count<br />

40<br />

30<br />

20<br />

10<br />

Wine RP<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(e) RP<br />

Figure B.3: Histograms of the φ (NMI) values obtained on each data representation in the<br />

Wine data set.<br />

The clustering indeterminacy regarding the selection of both the clustering algorithm<br />

and the dimensionality of the data representation is clearly observed in figures B.3(b) and<br />

B.3(c). For both the PCA and ICA data representations, a rather even histogram is obtained,<br />

spanning from φ (NMI) =0.04 to φ (NMI) =0.84.<br />

Moreover, notice that it is only with these data representations (PCA and ICA) that<br />

φ (NMI) values above 0.5 are obtained on this data set, which reinforces the importance of<br />

using the optimal type of features for the obtention of good clustering results.<br />

B.1.4 Glass data set<br />

The φ (NMI) histograms corresponding to the Glass data set are presented in figure B.4.<br />

clustering count<br />

25<br />

20<br />

15<br />

10<br />

5<br />

Glass Baseline<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(a) Baseline<br />

clustering count<br />

25<br />

20<br />

15<br />

10<br />

5<br />

Glass PCA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(b) PCA<br />

clustering count<br />

25<br />

20<br />

15<br />

10<br />

5<br />

Glass ICA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(c) ICA<br />

clustering count<br />

25<br />

20<br />

15<br />

10<br />

5<br />

Glass NMF<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(d) NMF<br />

clustering count<br />

25<br />

20<br />

15<br />

10<br />

5<br />

Glass RP<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(e) RP<br />

Figure B.4: Histograms of the φ (NMI) values obtained on each data representation in the<br />

Glass data set.<br />

Notice the distinct histogram distributions obtained for each data representation, which<br />

gives an idea of how the selection of a particular data representation influences the quality of<br />

the clustering results. Additionally, a pretty wide range of values of φ (NMI) are observed in<br />

the histograms corresponding to the feature extraction based data representations (figures<br />

B.4(b) to B.4(e)), thus evidencing the effect of the dimensionality reduction and clustering<br />

algorithm selection indeterminacy.<br />

236

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!