29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

B.1. Clustering indeterminacies in unimodal data sets<br />

Data set Data representation<br />

name Baseline PCA ICA NMF RP<br />

Zoo 28 392 392 392 392<br />

Iris 28 56 56 56 56<br />

Wine 28 308 308 308 308<br />

Glass 28 196 196 196 196<br />

Ionosphere 28 896 896 - 896<br />

WDBC 28 784 784 784 784<br />

Balance 28 56 - 56 56<br />

Mfeat<br />

28 on each of its 6 representations<br />

(FAC, FOU, KAR, MOR, PIX and ZER)<br />

miniNG 28 504 504 504 504<br />

Segmentation 28 476 476 - 476<br />

BBC 28 392 392 392 392<br />

PenDigits 28 392 392 392 392<br />

Table B.1: Number of individual clusterings per data representation on each unimodal data<br />

set.<br />

the ICA algorithm employed for deriving the homonymous object representation presented<br />

convergence problems when executed on the Balance data collection, so no ICA representation<br />

was created on this data set.<br />

In the next paragraphs, we describe the clustering results obtained on each data set,<br />

emphasizing which clustering configurations lead to the best clustering results in each case.<br />

B.1.1 Zoo data set<br />

Figure B.1 presents the histograms of the φ (NMI) values (ranging in the [0,1] interval) obtained<br />

by all the clustering algorithms on each data representation for the Zoo data collection.<br />

Recall that φ (NMI) = 1 corresponds to a perfect match between the ground truth and<br />

a clustering solution. The analysis of these histograms help us to interpret the influence of<br />

the clustering indeterminacies on the quality of the clustering results.<br />

clustering count<br />

30<br />

20<br />

10<br />

Zoo Baseline<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(a) Baseline<br />

clustering count<br />

30<br />

20<br />

10<br />

Zoo PCA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(b) PCA<br />

clustering count<br />

30<br />

20<br />

10<br />

Zoo ICA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(c) ICA<br />

clustering count<br />

30<br />

20<br />

10<br />

Zoo NMF<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(d) NMF<br />

clustering count<br />

30<br />

20<br />

10<br />

Zoo RP<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(e) RP<br />

Figure B.1: Histograms of the φ (NMI) values obtained on each data representation in the<br />

Zoodataset.<br />

Firstly, by inspecting the histogram corresponding to the clustering results obtained by<br />

applying the 28 algorithms on the baseline object representation (figure B.1(a)), we can<br />

234

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!