Real-time feature extraction from video stream data for stream ...

Real-time feature extraction from video stream data for stream ...

3.2. Supervised Learning

Gini Index The Gini Index was introduced in CART and considers binary splits only.

Again the impurity of a set of examples S before and after a potential split is

measured. The feature A, that reduces the impurity most, is then selected as the

splitting feature. The Gini Index Gini(S) is defined as

Gini(S) = 1 −


P (C j ) 2

where P (C j ), j ∈ {1, ..., k} again denotes the probability that an example in S

belongs to class C j . Assume feature A is chosen as the splitting feature and S gets

split into subsets S 1 and S 2 the impurity after the split is given by


Gini A (S) = |S 1|

|S| Gini(S 1) + |S 2|

|S| Gini(S 2)

Based on these values, the split is carried out for the feature A that reduced the

impurity of S best.

∆Gini(A) = Gini(S) − Gini A (S)

For nominal features with v possible outcomes, the optimal partitioning of S is

determined by testing all potential combinations of binary subsets.

The recursive splitting of the example set S is performed until a certain stop condition

is fulfilled. This could be that the size of the tree has reached a certain level, the purity

of the leaves exceeds a given threshold or there is no feature left over that really helps

to reduce the impurity further.

3.2.3. Evaluation Methods

To evaluate the performance of machine learning algorithms, it is necessary to define a

set of performance measurements. This section gives a short overview of the performance

measurements that are used in chapter 7.

First of all we have to define the expression performance. Performance can be viewed

from different perspectives: It can either relate to the time it takes to build a model,

to how long the classification of an training example takes or to the scalability and

storage usage of an algorithm. All these performance criteria are reasonable, but will

not be detailed in this thesis. When talking about performance, we mainly focus on

accuracy of an algorithm. The model ˆf, produced by a learning algorithm, is said to

have a perfect accuracy, if the predicted output y ˆ(i)

for each example ⃗x (i) ∈ X train equals

the true label y (i) . Obviously this could easily be achieved by a learning algorithm by

simply memorizing all data given in X train and not generalizing at all. Hence the model

complexity would be very high, but the prediction error on the training data would be

zero. Nevertheless such an algorithm would surely perform poor on unseen test data.

This phenomenon is known as bias-variance-tradeoff and shown in figure 3.4. In order

to handle this problem, we have to decrease the complexity of the model and test it on

different data than the data it was trained on. Hence it is necessary to split the given


More magazines by this user
Similar magazines