Real-time feature extraction from video stream data for stream ...

3.2. Supervised Learning

Gini Index The Gini Index was introduced in CART and considers binary splits only.

Again the impurity of a set of examples S be**for**e and after a potential split is

measured. The **feature** A, that reduces the impurity most, is then selected as the

splitting **feature**. The Gini Index Gini(S) is defined as

Gini(S) = 1 −

k∑

P (C j ) 2

where P (C j ), j ∈ {1, ..., k} again denotes the probability that an example in S

belongs to class C j . Assume **feature** A is chosen as the splitting **feature** and S gets

split into subsets S 1 and S 2 the impurity after the split is given by

j=1

Gini A (S) = |S 1|

|S| Gini(S 1) + |S 2|

|S| Gini(S 2)

Based on these values, the split is carried out **for** the **feature** A that reduced the

impurity of S best.

∆Gini(A) = Gini(S) − Gini A (S)

For nominal **feature**s with v possible outcomes, the optimal partitioning of S is

determined by testing all potential combinations of binary subsets.

The recursive splitting of the example set S is per**for**med until a certain stop condition

is fulfilled. This could be that the size of the tree has reached a certain level, the purity

of the leaves exceeds a given threshold or there is no **feature** left over that really helps

to reduce the impurity further.

3.2.3. Evaluation Methods

To evaluate the per**for**mance of machine learning algorithms, it is necessary to define a

set of per**for**mance measurements. This section gives a short overview of the per**for**mance

measurements that are used in chapter 7.

First of all we have to define the expression per**for**mance. Per**for**mance can be viewed

**from** different perspectives: It can either relate to the **time** it takes to build a model,

to how long the classification of an training example takes or to the scalability and

storage usage of an algorithm. All these per**for**mance criteria are reasonable, but will

not be detailed in this thesis. When talking about per**for**mance, we mainly focus on

accuracy of an algorithm. The model ˆf, produced by a learning algorithm, is said to

have a perfect accuracy, if the predicted output y ˆ(i)

**for** each example ⃗x (i) ∈ X train equals

the true label y (i) . Obviously this could easily be achieved by a learning algorithm by

simply memorizing all **data** given in X train and not generalizing at all. Hence the model

complexity would be very high, but the prediction error on the training **data** would be

zero. Nevertheless such an algorithm would surely per**for**m poor on unseen test **data**.

This phenomenon is known as bias-variance-tradeoff and shown in figure 3.4. In order

to handle this problem, we have to decrease the complexity of the model and test it on

different **data** than the **data** it was trained on. Hence it is necessary to split the given

33