29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Appendix A<br />

Experimental setup<br />

A.1 The CLUTO clustering package<br />

All the clustering algorithms employed in the experimental sections of this work have been<br />

extracted from the CLUTO clustering toolkit. In its authors’ words, “CLUTO is a software<br />

package for clustering low- and high-dimensional data sets and for analyzing the characteristics<br />

of the obtained clusters. CLUTO is well-suited for clustering data sets arising in many<br />

diverse application areas including information retrieval, customer purchasing transactions,<br />

web, geographic information systems, science, and biology”. It is available online for download<br />

at http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download. We chose CLUTO as<br />

our clustering algorithm provider due to its ease of use, robustness, speed and scalability,<br />

as CLUTO’s algorithms have been optimized for operating on very large data sets both<br />

in terms of the number of objects (up to ∼ 10 5 ) as well as the number of features (up to<br />

∼ 10 4 ).<br />

CLUTO provides clustering algorithms based on the partitional, agglomerative, and<br />

graph-partitioning paradigms. Most of the algorithms implemented in CLUTO treat clustering<br />

as an optimization problem, thus seeking to maximize or minimize a particular clustering<br />

criterion function, which can be defined either globally or locally over the entire<br />

clustering solution space. As in any clustering process, computing the value of these criterion<br />

functions requires measuring the similarity between the objects in the data set. This<br />

means that, for applying a specific CLUTO clustering algorithm, it is necessary to select<br />

the desired:<br />

– clustering strategy (clustering paradigm and specific implementation): CLUTO includes<br />

six implementations of partitional, hierarchical agglomerative and graph-based<br />

clustering strategies.<br />

– criterion function: CLUTO provides a total of eleven criterion functions for driving<br />

its clustering algorithms.<br />

– similarity measure: CLUTO allows to measure the similarity between objects using<br />

four distinct alternatives.<br />

The six implementations of partitional, hierarchical agglomerative and graph-based clustering<br />

strategies available in the CLUTO clustering toolkit are briefly described in the<br />

217

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!