Real-time feature extraction from video stream data for stream ... Real-time feature extraction from video stream data for stream ...

3. Machine Learning

tree learning algorithms, developed independently between the late 1970s and early

1980s: Iterative Dichotomiser (ID3) by J. Ross Quinlan [Quinlan, 1986] and Classification

and Regression Trees (CART) by L. Breiman, J. Fiedman, R. Olshen, and C.

Stone [Breiman et al., 1984]. The idea is quite simple and straightforward: At each step,

the set of examples is divided into smaller subsets by splitting it by the values (or value

ranges) of one feature. The feature is hereby selected in a manner that it optimally

discriminates the set according to the classes. Hence, an ideal feature would split the example

set into subsets, that are pure (i.e. after the partition all examples in each subset

belong to the same class). As most likely no feature will split the example set into pure

subsets, different measures for the quality of a split have been proposed.

Information Gain The information gain criterion uses the entropy of a set of examples

to find the optimal feature for splitting the set into subsets. The entropy of a set

S measures the impurity of S and is defined as

E(S) = −

k∑

P (C j )log 2 (P (C j ))

j=1

where P (C j ), j ∈ {1, ..., k} denotes the probability that an example in S belongs

to class C j . We now assume that we split S into v subsets {S 1 , S 2 , ..., S v } based on

the value of a feature A in S. Afterwards we calculate, how helpful this splitting

would be by calculating the accumulated weighted entropy over all resulting subsets

S m , m ∈ {1, ..., v}.

v∑ |S m |

E A (S) = × E(S m )

|S|

m=1

The information gain Gain(A) of feature A is then defined as

Gain(A) = E(S) − E A (S)

and the feature with the highest information gain Gain(A) is chosen to split the

example set S. For nominal features the number v of subsets is hereby usually

the number of possible outcomes of the feature. For numerical features, any v

can be chosen and v value ranges for the numerical features can be defined. The

information gain criterion was introduced in ID3.

Gain Ratio By choosing the information gain criterion, features with a large number

of different values are more likely to be chosen than other features. Especially in a

setting, where one feature holds an unique value for each example (i.g. an ID), this

feature would be selected, as it splits the example set in a way, that each subset is

pure. Hence it makes sense to penalize features by their number of possible values

v. This is done by the gain ratio criterion. The gain ratio GainRation(A) of a

feature A is defined as

GainRatio(A) = Gain(A)

SE A (S)

with

SE A (S) = −

v∑

m=1

|S m |

|S|

× log 2 ( |S m|

|S| )

32

More magazines by this user
Similar magazines