11.12.2012 Views

D2.1 Requirements and Specification - CORBYS

D2.1 Requirements and Specification - CORBYS

D2.1 Requirements and Specification - CORBYS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>D2.1</strong> <strong>Requirements</strong> <strong>and</strong> <strong>Specification</strong><br />

Another approach is based on learning progress (Kaplan <strong>and</strong> Oudeyer, 2004) which considers the speed by<br />

which an agent succeeds in learning a given error function. The idea is that it is not the performance in<br />

reaching a goal that the agent aims at optimising, but the speed by which the agent improves. When a<br />

saturation process begins as an agent becomes better at achieving a goal, this will cause the agent at the same<br />

time to improve less <strong>and</strong> less due to the law of diminishing returns, <strong>and</strong> the agent will then switch to learn a<br />

different goal. Thus, the agent will keep looking for goals which have an increased level of novelty in them.<br />

This philosophy is also pursued in Schmidhuber (2002) where learning is modelled ab initio as a compression<br />

scheme. Learning progress then directly <strong>and</strong> universally expresses itself as code growth rate during<br />

compression of the learning process. The actual improvement is measured by the increase of the effectiveness<br />

of the compression, not just the length of the compressed output. As an illustratory example, one can consider<br />

the discovery of new laws of motion which allow to compress laws learnt earlier much more effectively. Such<br />

compression gradients form the incentive structure of this ab initio model. The problem with this model is<br />

that it considers only universal Kolmogorov-type compression schemes. On the one h<strong>and</strong>, they offer various<br />

optimality guarantees, but, on the other h<strong>and</strong>, they are only of theoretical relevance due to their strongly<br />

asymptotical character, i.e. the universal guarantees can only be established for sufficiently long learning runs<br />

which are typically orders of magnitude outside the range that is available to an artificial or biological agent.<br />

12.3.2 Principle­Based Self­Motivated Models<br />

The approaches discussed in the previous section define generic concepts which can be implemented in<br />

manifold ways <strong>and</strong> depend on the particular instantiation of the learning models or compression schemes.<br />

Principle-based models are now more important where the intrinsic motivation principle is directly embedded<br />

into <strong>and</strong> “implemented” by the formalism.<br />

Strictly speaking, Schmidhuber’s universal compression gradient also belongs in the class of principle-based<br />

models. However, since the particular compression scheme is not canonically defined, <strong>and</strong> the formalism<br />

becomes insensitive to the scheme only in the asymptotic case (which is typically not realisable), we have<br />

grouped it above together with the approaches characterised by generic concepts rather than concrete<br />

principles.<br />

One learning concept is ISO-learning which is based on modelling low-level anticipatory feedback loops (Porr<br />

et al., 2003). Another important concept for implementing intrinsic self-motivation was the homeokinesis<br />

concept introduced in (Der et al., 1999, Der, 2000, Der, 2001). Given the embodiment of a concrete agent,<br />

homeokinetic control is defined by constructing behaviour of an agent in such a way that it maximises<br />

predictability of its sensoric stimuli in the future. This is achieved by an internal model of the agent that is<br />

using a learning rule to minimise the predictive error for future stimuli encountered by the agents.<br />

Importantly, this approach encapsulates the embodiment as a core component of the model. It is only defined<br />

in the context of the complete sensorimotor loop <strong>and</strong> elevates the body into a central part of the cognitive<br />

process, in opposition to many approaches from traditional AI; this perspective thus provides a quantitative<br />

grounding of the embodied intelligence perspective (Brooks, 1991, Paul, 2006, Pfeifer <strong>and</strong> Bongard, 2007).<br />

The early “naive” homeokinesis approach had the problem that it tended to favour situations where the<br />

prediction for the agent is simple. However, since this has a propensity to send the agent into steady states,<br />

the model was extended by a mechanism to ensure a rich sensorimotor stimulus spectrum (Der et al., 2006).<br />

For this, the estimated sensorimotor dynamics of the system is considered as a dynamical system whose<br />

Lyapunov exponents are estimated. The agent then moves towards states which have the most negative time-<br />

126

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!