22.07.2013 Views

PhD Document - Universidad de Las Palmas de Gran Canaria

PhD Document - Universidad de Las Palmas de Gran Canaria

PhD Document - Universidad de Las Palmas de Gran Canaria

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 2. THE CASE OF SOCIAL ROBOTICS<br />

ples. What about the available knowledge about the solution? A priori knowledge about the<br />

solution to the task allows to <strong>de</strong>fine the hypothesis space. We may know, for example, that<br />

the solution is a polynomial function. Better still, we may know that the solution is actually<br />

a quadratic. The more knowledge we have of the form of the solution the smaller the search<br />

space in which our learner will look for a candidate solution:<br />

H0 ⊃ H1 ⊃ H2 ⊃ . . . ⊃ H ∗<br />

H ∗ being the objective hypothesis (the solution).<br />

(2.1)<br />

Virtually all "practical" learners employ some sort of complexity penalization tech-<br />

nique [Scheffer, 1999], including the method of sieves and Bayesian, Minimum Description<br />

Length, Vapnik-Chervonenkis dimension and validation methods [Nowak, 2004]. The basic<br />

i<strong>de</strong>a of complexity penalization is to minimize the sum of the error in the training set and a<br />

complexity measure of the hypothesis space. The quantity that we are ultimately interested<br />

in is the risk:<br />

R(α) =<br />

<br />

1<br />

|y − f(x, α)|dP (x, y) (2.2)<br />

2<br />

α being the parameters of the learner (to be set). The risk is a measure of discrepancy<br />

between the values f(x, α) produced by our learner and the objective values y produced by<br />

H ∗ .<br />

Theoretical work by Vapnik and others has allowed to obtain upper bounds on this<br />

risk [Vapnik, 1995]. With probability 1 − η the following bound holds:<br />

R(α) ≤ eemp +<br />

h(log(2l/h) + 1) − log(η/4)<br />

l<br />

(2.3)<br />

where eemp is the empirical risk (the error on the training set), and h a non-negative integer<br />

called VC dimension. The VC dimension of a set of functions Hi, a measure of its complex-<br />

ity, is the maximum number of training points that can be shattered 4 by Hi. Note that we<br />

can in principle obtain an eemp as low as we want, though we would have to do it by making<br />

our hypothesis space more complex (i.e. by having a large number of <strong>de</strong>grees <strong>de</strong> freedom).<br />

Alternatively, we can consi<strong>de</strong>r only simple hypotheses, but then we would obtain large eemp.<br />

4 for a 2-class recognition problem and a set of l training points, if the set can be labelled in all possible 2 l<br />

ways, and for each labelling a member of the set Hi can be found which correctly assigns those labels, then we<br />

say that the set of points is shattered by that set of functions.<br />

20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!