Real-time feature extraction from video stream data for stream ... Real-time feature extraction from video stream data for stream ...

3. Machine Learning

In order to enable machines to sort letters by their ZIP codes, they have to be able

to recognize handwritten digits on the envelopes. The problem is, that handwritings

differ a lot and therefore the same digit can look totally different on different envelopes.

Figure 3.1 shows examples for this. Humans have learned to generalize and are able to

identify various handwritings, even if they have never seen the handwriting before. The

challenge is to teach this ability to a machine. Hence, the task T is to classify handwritten

digits. The program bases on a set of examples of handwritten digits. This set forms the

experience E. An obvious performance measure P is given by the recognition rate, which

is the percentage of correct recognized digits. As it is impossible to memorize all possible

appearances for each digit, the recognition has to be based on general characteristics,

that have been observed on the experience E.

This entire chapter bases on the standard references about machine learning. Namely I

want to mention two books: ”The Elements of Statistical Learning” by Hastie, Tibshirani

and Friedman [Hastie et al., 2001] and ”Machine Learning” by Mitchell [Mitchell, 1997].

3.1. Notation

The input of a machine learning algorithm is some experience E. The experience is

represented as set of examples also referred to as training set X train . This set contains

N examples ⃗x (i) , i ∈ {1, N}

X train = {⃗x (1) , ...., ⃗x (N) }

Each example ⃗x (i) ∈ X train is a p-dimensional vector. Here, p is the number of features

A j , j ∈ {1, p} that form one example. The value of the j-th feature in the i-th example is

referred to as x (i)

j

. Without loss of generality we can assume that the number of features

p and their order are constant throughout the training set X train . Features can either

be numerical, nominal or binominal.

Numerical features have values ranging over N or R. They are the most common

feature type, since a numerical representation supports statistical analysis best.

Values can be either continuous or discrete. Most physical measurements are examples

of continuous data, like i.g. temperature, length or speed. In contrast, the

age of a person is a discrete numerical feature, as the number of possible outcomes

is finite. Continuous data can be transfered to discrete data by summarizing values

over a certain number of value ranges. This process is known as discretization.

Nominal features have values ranging over an unordered and discrete set of possible

values. A good example, taken from the popular iris flower discrimination dataset 1 ,

is for example the species of an iris flower, that gets distinguished between three

possible types: {iris setosa, iris virginica, iris versicolor}.

Binominal features are nominal features that only have two possible outcomes. Most

often these two outcomes are {true, false}.

1 http://archive.ics.uci.edu/ml/datasets/Iris

28

More magazines by this user
Similar magazines