29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A.4. Cluster ensembles<br />

Data set name |dfA| =1 |dfA| =10 |dfA| =19 |dfA| =28<br />

Zoo 57 570 1083 1596<br />

Iris 9 90 171 252<br />

Wine 45 450 855 1260<br />

Glass 29 290 551 812<br />

Ionosphere 97 970 1843 2716<br />

WDBC 113 1130 2147 3164<br />

Balance 7 70 133 196<br />

Mfeat 6 60 114 168<br />

miniNG 73 730 1387 2044<br />

Segmentation 52 520 988 1456<br />

BBC 57 570 1083 1596<br />

PenDigits 57 570 1083 1596<br />

Table A.4: Cluster ensemble sizes l corresponding to distinct algorithmic diversity configurations<br />

for the unimodal data sets.<br />

employ several mutually crossed diversity factors (the twenty-eight clustering algorithms of<br />

the CLUTO clustering package presented in section A.1 are run on the data representations<br />

with varying dimensionalities described in section A.3) so as to generate the individual<br />

components of our cluster ensembles.<br />

However, several cluster ensemble instances have been generated by limiting the cardinality<br />

of the algorithmic diversity factor |dfA| (i.e. the number of clustering algorithms<br />

considered in creating the cluster ensemble components) to a discrete set of values: |dfA| =<br />

{1, 10, 19, 28}. This strategy is adopted with the objective of experimentally evaluating our<br />

proposals both in terms of i) their sensitivity to the cluster ensemble diversity (as the larger<br />

|dfA|, the more diverse the cluster ensemble), and ii) their computational scalability as regards<br />

the cluster ensemble size l (since this factor is proportional to |dfA|). Notice that the<br />

cluster ensembles with |dfA| = {1, 10, 19} are randomly sampled subsets of the maximally<br />

diverse cluster ensemble (the one corresponding to |dfA| =28).<br />

Tables A.4 and A.5 present the sizes of the cluster ensembles corresponding to the<br />

distinct diversity scenarios (i.e. cardinalities of the algorithmic diversity factor dfA) onthe<br />

unimodal and multimodal data collections employed in this work.<br />

Firstly, table A.4 presents the cluster ensemble sizes corresponding to the unimodal<br />

data sets. As expected, the cluster ensemble size l grows linearly with the value of |dfA|.<br />

Depending on the cardinalities of the representational and dimensional diversity factors of<br />

each data collection, fairly distinct cluster ensembles sizes are obtained (from the modest<br />

values of the Iris data set to the highly populated cluster ensembles of the WDBC collection).<br />

<strong>La</strong>st, table A.5 presents the cluster ensembles corresponding to the four multimodal data<br />

collections employed in this work for each diversity scenario. It is important to highlight<br />

the fact that the values of l presented in this table encompass the two unimodal and the<br />

multimodal data representations of the objects contained in these data sets.<br />

The reader is referred to appendix B for an analysis of the quality and diversity of the<br />

components of these cluster ensembles.<br />

228

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!