3.3. Unsupervised Learning

which cluster represents which digit. For example all handwritten digits representing a

four could be enclosed in cluster C 7 .

Examples for unsupervised learning algorithms are association rule learners and clustering.

As clustering is used in chapter 7, it is described in more detail in the following


3.3.1. Cluster Analysis

Cluster Analysis refers to the machine learning task of grouping ”a collection of objects

into subsets or ’clusters’, such that those within each cluster are more closely related to

one another than objects assigned to different clusters.” [Hastie et al., 2001] Two objects

are supposed two be closely related (also called similar), when the distance between them

is small. Hence the similarity of two examples is inverse to their distance. Simplifying

we assume that

( )

sim (x (1) , x (2) ) = 1 − d(x (1) , x (2) )

Figure 3.5 shows an example for a possible clustering of two-dimensional numerical data.

The Euclidean distance is chosen as the distance measure and the resulting clusters are

indicated by the color of the examples.

Figure 3.5.: Example for the result of a cluster analysis. Randomly generated twodimensional

data produced by using the ’Generate Data’-Operator of RapidMiner.

Cluster Analysis is performed by the ’Clustering’-Operator using k-Means with k=3.


