Real-time feature extraction from video stream data for stream ...

3. Machine Learning

In order to enable machines to sort letters by their ZIP codes, they have to be able

to recognize handwritten digits on the envelopes. The problem is, that handwritings

differ a lot and there**for**e the same digit can look totally different on different envelopes.

Figure 3.1 shows examples **for** this. Humans have learned to generalize and are able to

identify various handwritings, even if they have never seen the handwriting be**for**e. The

challenge is to teach this ability to a machine. Hence, the task T is to classify handwritten

digits. The program bases on a set of examples of handwritten digits. This set **for**ms the

experience E. An obvious per**for**mance measure P is given by the recognition rate, which

is the percentage of correct recognized digits. As it is impossible to memorize all possible

appearances **for** each digit, the recognition has to be based on general characteristics,

that have been observed on the experience E.

This entire chapter bases on the standard references about machine learning. Namely I

want to mention two books: ”The Elements of Statistical Learning” by Hastie, Tibshirani

and Friedman [Hastie et al., 2001] and ”Machine Learning” by Mitchell [Mitchell, 1997].

3.1. Notation

The input of a machine learning algorithm is some experience E. The experience is

represented as set of examples also referred to as training set X train . This set contains

N examples ⃗x (i) , i ∈ {1, N}

X train = {⃗x (1) , ...., ⃗x (N) }

Each example ⃗x (i) ∈ X train is a p-dimensional vector. Here, p is the number of **feature**s

A j , j ∈ {1, p} that **for**m one example. The value of the j-th **feature** in the i-th example is

referred to as x (i)

j

. Without loss of generality we can assume that the number of **feature**s

p and their order are constant throughout the training set X train . Features can either

be numerical, nominal or binominal.

Numerical **feature**s have values ranging over N or R. They are the most common

**feature** type, since a numerical representation supports statistical analysis best.

Values can be either continuous or discrete. Most physical measurements are examples

of continuous **data**, like i.g. temperature, length or speed. In contrast, the

age of a person is a discrete numerical **feature**, as the number of possible outcomes

is finite. Continuous **data** can be transfered to discrete **data** by summarizing values

over a certain number of value ranges. This process is known as discretization.

Nominal **feature**s have values ranging over an unordered and discrete set of possible

values. A good example, taken **from** the popular iris flower discrimination **data**set 1 ,

is **for** example the species of an iris flower, that gets distinguished between three

possible types: {iris setosa, iris virginica, iris versicolor}.

Binominal **feature**s are nominal **feature**s that only have two possible outcomes. Most

often these two outcomes are {true, false}.

1 http://archive.ics.uci.edu/ml/**data**sets/Iris

28