29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

B.1.5 Ionosphere data set<br />

Appendix B. Experiments on clustering indeterminacies<br />

As regards the Ionosphere data collection, pretty similar φ (NMI) distributions are obtained<br />

for the PCA, ICA and RP representations (see figures B.5(b) to B.5(d)). Thus, in this<br />

case, there apparently exists a lower dependence between the quality of clustering and the<br />

feature extraction technique used for representing the objects. Nevertheless, despite the<br />

notable concentration of clustering results on the leftmost part of the histograms (i.e. poor<br />

clusterings with low values of φ (NMI) ), there exist some clustering solutions reaching φ (NMI)<br />

values above 0.5 using PCA and ICA feature extraction (see figures B.5(b) and B.5(c)).<br />

Moreover, notice that pretty poor quality clusterings are obtained when operating on the<br />

baseline object representation (figure B.5(a)).<br />

clustering count<br />

150<br />

100<br />

50<br />

Ionosphere Baseline<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(a) Baseline<br />

clustering count<br />

150<br />

100<br />

50<br />

Ionosphere PCA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(b) PCA<br />

clustering count<br />

150<br />

100<br />

50<br />

Ionosphere ICA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(c) ICA<br />

clustering count<br />

150<br />

100<br />

50<br />

Ionosphere RP<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(d) RP<br />

Figure B.5: Histograms of the φ (NMI) values obtained on each data representation in the<br />

Ionosphere data set.<br />

B.1.6 WDBC data set<br />

As regards the WDBC data collection, there exists a notable difference between the profiles<br />

of the histograms of the PCA, ICA and NMF representations when compared to the<br />

baseline and RP histograms. Indeed, the former present a sharp peak located in the lowest<br />

region of the φ (NMI) range, whereas the latter do not—which reflects the data representation<br />

clustering indeterminacy. The notably large differences between the highest and lowest<br />

φ (NMI) values of all the histograms reveal the influence of the clustering algorithm and data<br />

dimensionality selection on the quality of the partition results.<br />

clustering count<br />

100<br />

50<br />

WDBC Baseline<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(a) Baseline<br />

clustering count<br />

100<br />

50<br />

WDBC PCA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(b) PCA<br />

clustering count<br />

100<br />

50<br />

WDBC ICA<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(c) ICA<br />

clustering count<br />

100<br />

50<br />

WDBC NMF<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(d) NMF<br />

clustering count<br />

100<br />

50<br />

WDBC RP<br />

0<br />

0 0.5 1<br />

φ (NMI)<br />

(e) RP<br />

Figure B.6: Histograms of the φ (NMI) values obtained on each data representation in the<br />

WDBC data set.<br />

237

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!