3. Machine Learning

Prediction: true Prediction: false Total

Label: true TP FN ...

Label: false FP TN ...

Total ... ... ...

Table 3.2.: Confusion matrix for a binary classification task

Obviously the number of correctly labeled examples is given by summing up the true

positives and true negatives, whilst all examples belonging to the false negatives or false

positives are misclassified examples.

Correct = T P + T N

Missed = F N + F P

Based on that, the accuracy for binary classification tasks is simply defined as

Accuracy =


Correct + Missed =

T P + T N

T P + F P + T N + F N


Hence an accuracy of 100% is perfect, as this means that all predicted values are the

same as the true values. For binary classification task we furthermore define precision

and recall as appropriate evaluation criteria. These two values are commonly used.

Definition 12 (Precision) Precision is the percentage of retrieved items that are desired


P recision =


T P + F P


Definition 13 (Recall) Recall is the percentage of desired items that are retrieved.

Recall =


T P + F N


3.3. Unsupervised Learning

In contrast to supervised learning, unsupervised learning methods handle unlabeled data.

These algorithms try to find underlying patterns in the input data X train .

The task of recognizing handwritten digits on mailing envelopes, for example, can be

transfered into an unsupervised learning task by modifying the input data. Therefor we

remove the labels y (i) of all examples (x (i) , y (i) ) ∈ X ×Y . The learning task then is to find

similar patterns in the input data. We specify that we are searching for k = 10 clusters,

and try to assign one cluster C j , j ∈ {0, 1, 2, ..., k − 1} to each x (i) . Given the clustering

algorithms works well, the output will consist of labeled examples (x (i) , ŷ (i) ), where each

x (i) is assigned to one cluster. Unfortunately we do not have any information about


