25.12.2013 Views

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

CRANFIELD UNIVERSITY Eleni Anthippi Chatzimichali ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.4.2 Cluster Analysis<br />

Cluster analysis consists of a set of unsupervised methods that are used in numerous<br />

data mining tasks. The clustering algorithms attempt to partition a dataset into several<br />

subsets – the clusters – so that data belonging to the same cluster are mutually similar,<br />

providing a sense of homogeneity.<br />

1.4.2.1 Hierarchical Cluster Analysis (HCA)<br />

Hierarchical clustering (HCA) is based on calculating the distances between<br />

elements found in a given matrix of size . The distances represent the degree<br />

of similarity/dissimilarity between these objects. The shorter the distance, the more<br />

similar the objects are with each other. HCA is based on two important categories of<br />

algorithms – distance and linkage algorithms.<br />

Distance algorithms determine how the similarity or “distance measure” between two<br />

given objects is calculated. The most widely used distance algorithms include<br />

Euclidean and Mahalanobis distance, among others. For instance, for two objects<br />

and in , the Euclidean distance in -dimensional space satisfies the equation<br />

√∑<br />

Equation 4 Euclidean distance algorithm<br />

Hierarchical clustering is graphically represented in tree structures, also known as<br />

dendrograms. Linkage algorithms determine how the clustering is performed.<br />

A bottom-up linkage algorithm includes the following steps:<br />

1. Each object forms and belongs to its own cluster<br />

2. The two closest clusters are linked together<br />

3. The two linked clusters are aggregated into a single new cluster<br />

4. The algorithm keeps iterating from Step 2 until the number of clusters is one<br />

12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!