D2.1 Requirements and Specification - CORBYS
D2.1 Requirements and Specification - CORBYS
D2.1 Requirements and Specification - CORBYS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>D2.1</strong> <strong>Requirements</strong> <strong>and</strong> <strong>Specification</strong><br />
Another approach is based on learning progress (Kaplan <strong>and</strong> Oudeyer, 2004) which considers the speed by<br />
which an agent succeeds in learning a given error function. The idea is that it is not the performance in<br />
reaching a goal that the agent aims at optimising, but the speed by which the agent improves. When a<br />
saturation process begins as an agent becomes better at achieving a goal, this will cause the agent at the same<br />
time to improve less <strong>and</strong> less due to the law of diminishing returns, <strong>and</strong> the agent will then switch to learn a<br />
different goal. Thus, the agent will keep looking for goals which have an increased level of novelty in them.<br />
This philosophy is also pursued in Schmidhuber (2002) where learning is modelled ab initio as a compression<br />
scheme. Learning progress then directly <strong>and</strong> universally expresses itself as code growth rate during<br />
compression of the learning process. The actual improvement is measured by the increase of the effectiveness<br />
of the compression, not just the length of the compressed output. As an illustratory example, one can consider<br />
the discovery of new laws of motion which allow to compress laws learnt earlier much more effectively. Such<br />
compression gradients form the incentive structure of this ab initio model. The problem with this model is<br />
that it considers only universal Kolmogorov-type compression schemes. On the one h<strong>and</strong>, they offer various<br />
optimality guarantees, but, on the other h<strong>and</strong>, they are only of theoretical relevance due to their strongly<br />
asymptotical character, i.e. the universal guarantees can only be established for sufficiently long learning runs<br />
which are typically orders of magnitude outside the range that is available to an artificial or biological agent.<br />
12.3.2 PrincipleBased SelfMotivated Models<br />
The approaches discussed in the previous section define generic concepts which can be implemented in<br />
manifold ways <strong>and</strong> depend on the particular instantiation of the learning models or compression schemes.<br />
Principle-based models are now more important where the intrinsic motivation principle is directly embedded<br />
into <strong>and</strong> “implemented” by the formalism.<br />
Strictly speaking, Schmidhuber’s universal compression gradient also belongs in the class of principle-based<br />
models. However, since the particular compression scheme is not canonically defined, <strong>and</strong> the formalism<br />
becomes insensitive to the scheme only in the asymptotic case (which is typically not realisable), we have<br />
grouped it above together with the approaches characterised by generic concepts rather than concrete<br />
principles.<br />
One learning concept is ISO-learning which is based on modelling low-level anticipatory feedback loops (Porr<br />
et al., 2003). Another important concept for implementing intrinsic self-motivation was the homeokinesis<br />
concept introduced in (Der et al., 1999, Der, 2000, Der, 2001). Given the embodiment of a concrete agent,<br />
homeokinetic control is defined by constructing behaviour of an agent in such a way that it maximises<br />
predictability of its sensoric stimuli in the future. This is achieved by an internal model of the agent that is<br />
using a learning rule to minimise the predictive error for future stimuli encountered by the agents.<br />
Importantly, this approach encapsulates the embodiment as a core component of the model. It is only defined<br />
in the context of the complete sensorimotor loop <strong>and</strong> elevates the body into a central part of the cognitive<br />
process, in opposition to many approaches from traditional AI; this perspective thus provides a quantitative<br />
grounding of the embodied intelligence perspective (Brooks, 1991, Paul, 2006, Pfeifer <strong>and</strong> Bongard, 2007).<br />
The early “naive” homeokinesis approach had the problem that it tended to favour situations where the<br />
prediction for the agent is simple. However, since this has a propensity to send the agent into steady states,<br />
the model was extended by a mechanism to ensure a rich sensorimotor stimulus spectrum (Der et al., 2006).<br />
For this, the estimated sensorimotor dynamics of the system is considered as a dynamical system whose<br />
Lyapunov exponents are estimated. The agent then moves towards states which have the most negative time-<br />
126