D2.1 Requirements and Specification - CORBYS

Recommendations

Info

D2.1 Requirements and Specification Another approach is based on learning progress (Kaplan and Oudeyer, 2004) which considers the speed by which an agent succeeds in learning a given error function. The idea is that it is not the performance in reaching a goal that the agent aims at optimising, but the speed by which the agent improves. When a saturation process begins as an agent becomes better at achieving a goal, this will cause the agent at the same time to improve less and less due to the law of diminishing returns, and the agent will then switch to learn a different goal. Thus, the agent will keep looking for goals which have an increased level of novelty in them. This philosophy is also pursued in Schmidhuber (2002) where learning is modelled ab initio as a compression scheme. Learning progress then directly and universally expresses itself as code growth rate during compression of the learning process. The actual improvement is measured by the increase of the effectiveness of the compression, not just the length of the compressed output. As an illustratory example, one can consider the discovery of new laws of motion which allow to compress laws learnt earlier much more effectively. Such compression gradients form the incentive structure of this ab initio model. The problem with this model is that it considers only universal Kolmogorov-type compression schemes. On the one hand, they offer various optimality guarantees, but, on the other hand, they are only of theoretical relevance due to their strongly asymptotical character, i.e. the universal guarantees can only be established for sufficiently long learning runs which are typically orders of magnitude outside the range that is available to an artificial or biological agent. 12.3.2 PrincipleBased SelfMotivated Models The approaches discussed in the previous section define generic concepts which can be implemented in manifold ways and depend on the particular instantiation of the learning models or compression schemes. Principle-based models are now more important where the intrinsic motivation principle is directly embedded into and “implemented” by the formalism. Strictly speaking, Schmidhuber’s universal compression gradient also belongs in the class of principle-based models. However, since the particular compression scheme is not canonically defined, and the formalism becomes insensitive to the scheme only in the asymptotic case (which is typically not realisable), we have grouped it above together with the approaches characterised by generic concepts rather than concrete principles. One learning concept is ISO-learning which is based on modelling low-level anticipatory feedback loops (Porr et al., 2003). Another important concept for implementing intrinsic self-motivation was the homeokinesis concept introduced in (Der et al., 1999, Der, 2000, Der, 2001). Given the embodiment of a concrete agent, homeokinetic control is defined by constructing behaviour of an agent in such a way that it maximises predictability of its sensoric stimuli in the future. This is achieved by an internal model of the agent that is using a learning rule to minimise the predictive error for future stimuli encountered by the agents. Importantly, this approach encapsulates the embodiment as a core component of the model. It is only defined in the context of the complete sensorimotor loop and elevates the body into a central part of the cognitive process, in opposition to many approaches from traditional AI; this perspective thus provides a quantitative grounding of the embodied intelligence perspective (Brooks, 1991, Paul, 2006, Pfeifer and Bongard, 2007). The early “naive” homeokinesis approach had the problem that it tended to favour situations where the prediction for the agent is simple. However, since this has a propensity to send the agent into steady states, the model was extended by a mechanism to ensure a rich sensorimotor stimulus spectrum (Der et al., 2006). For this, the estimated sensorimotor dynamics of the system is considered as a dynamical system whose Lyapunov exponents are estimated. The agent then moves towards states which have the most negative time- 126
D2.1 Requirements and Specification reversed Lyapunov exponents in sensoric stimulus space, while maintaining a high level of predictability; these are the maximally unstable (thus structure-rich) points of the sensorimotor dynamics, but the predictability still maintains structure in the agent dynamics. With the success and transparent interpretation of information theory-based methods (discussed below), the homeokinesis approach was generalised to use the information-theoretic language: the “unstable-yet-predictable” concept was translated into the concept of predictive information in (Ay et al., 2008). Predictive information (Bialek et al., 2001, Shalizi, 2001) is the mutual information that the past of a time series contains about its future. This approach makes it possible to convert the original dynamic systems approach into an information-theoretic setting. A high value indicates both good predictability, but, at the same time, it is more expressive than the criterium of minimum future error, because only sensorily rich pasts can achieve high mutual information values. An impoverished sensory past/future relation, even if predictable, does not achieve high predictive information since there is not much to predict in this case. The approach can be generalised to incorporate the learning process itself (Still, 2009). An alternative approach to utilising predictive information has been exploited in an evolutionary context where predictive information has been used to evolve locomotor patterns in simulated snakes with limited sensorics (Prokopenko et al., 2006a, Prokopenko et al., 2006b). Here, the dynamics is not adapted online, but sensorimotor dynamics are optimised via an evolutionary algorithm to maximize the predictive information statistics (using a computationally cheaper Renyi variant of the predictive information). The information-theoretic approach is highly versatile. It has found extensions to encompass major classes of information processing in sensorimotor loops, ranging from simple minimal models of control and encompasses various methods for the formulation of behaviour generation and adaption on the basis of information theory (Lungarella and Sporns, 2005, Lungarella and Sporns, 2006, Polani et al., 2007, Klyubin et al., 2004, Klyubin et al., 2007). The methodology makes it possible to model many aspects of agentenvironment interaction, such as autonomy itself (Bertschinger et al., 2008), or the formation of joint concepts in a group of agents (Möller and Polani, 2008). Central for CORBYS is the construction of intrinsically self-motivated behaviour and thus a degree of autonomy for the agents which should enable them to propose, initiate or estimate behaviours. As a basis for this, the universal utility empowerment will be used. Empowerment is the external channel capacity of an organism (Klyubin et al., 2005a, Klyubin et al., 2008). It is typically defined as a function over the states (or, in more general situations, as context function, see Capdepuy et al. 2007a). It is not, as the case with predictive information, a function of the trajectory, rather, a function of state and thus it acts as a universal utility. This, in particular, means that empowerment is always defined universally, i.e. in the same generic way, depending only on the particular embodiment and the dynamics of the environment; and, by acting as utility, the agent aims to maximise this utility in a greedy fashion by hillclimbing through the states guided by the local empowerment gradient. Empowerment is an expression of the least commitment idea in the absence of a concrete goal or reward. It rewards being in states which are least committed with respect to future perturbations or goals (Klyubin et al., 2008). Empowerment provides intrinsic, embodiment-based saliency criteria for desirable states of the world. This includes the identification of states affording novel manipulative degrees of freedom (Klyubin et al., 2005a) or corresponding to states of maximum centrality in state-action graphs (Anthony et al., 2008), as well as gradients for sensory feature adaptation (Klyubin et al., 2005b) and natural points of stability (Klyubin et al., 2008). Computation of empowerment in continuous scenarios provides an alternative to optimal control models for the control of dynamical systems and can avoid the backup of value data throughout the dynamical 127
Page 1 and 2:
CORBYS Cognitive Control Framework
Page 3 and 4:
D2.1 Requirements and Specification
Page 5 and 6:
Page 7 and 8:
Page 9 and 10:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86: D2.1 Requirements and Specification
Page 135: D2.1 Requirements and Specification
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
Page 193 and 194:
Page 195 and 196:
Page 197 and 198:
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
Page 207 and 208:
Page 209:
show all

D2.1 Requirements and Specification - CORBYS

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?