Advances in Intelligent Systems Research - of Marcus Hutter

More documents

Recommendations

Info

Towards Automated Code Generation for Autonomous Mobile Robots D. Kerr Intelligent Systems Research Centre University of Ulster Londonderry, BT48 7JL, UK U. Nehmzow Intelligent Systems Research Centre University of Ulster Londonderry, BT48 7JL, UK S.A. Billings Department of Automatic Control and Systems Engineering University of Sheffield Sheffield, S1 3JD, UK Abstract With the expected growth in mobile robotics the demand for expertise to develop robot control code will also increase. As end-users cannot be expected to develop this control code themselves, a more elegant solution would be to allow the end-users to teach the robot by demonstrating the task. In this paper we show how route learning tasks may be “translated” directly into robot control code simply by observing the task. We show how automated code generation may be facilitated through system identification — which algorithmically and automatically transfers human behaviour into control code, using transparent mathematical functions. We provide two route learning examples where a mobile robot automatically obtains control code simply by observing human behaviour, identifying it using system identification, and copying the behaviour. Introduction Motivation Mobile robotics will play an ever more important role in the future. We expect one growth area to be service robotics, especially home care robots for the elderly and infirm. Other important growth areas will be entertainment robotics and games, as well as security applications. All of these applications require some high-level, sophisticated programming, but the bulk of the programming work will be “standard” components of robot control, that will consume a lot of programmer resources — resources that could be better used. In this paper we show that it is possible for trajectory learning to obtain robot control code automatically, through system identification (Akanyeti et al., 2007): Control code was obtained by observing a human demonstrator following the desired route, and by translating his behaviour directly into code, without programming. Learning by demonstration is by now a widely used technique in robotics (see, for instance, (Demiris, 2009) and (Demiris and Dearden, 2005)). In terms of application (route learning), (Coates et al., 2008) are perhaps the most interesting here to mention: an expert was used to control a model helicopter, the desired optimal behaviour was obtained and modelled from a number sub-optimal demonstrations performed by the expert. This model was then used to control the helicopter. In contrast to our work, (Coates et al., 2008) use a specialist to provide the demonstration, and their control model incorporates a priori knowledge such as that the helicopter has to remain stationary for certain manoeuvres. In our experiments the trainer is not an expert in the task, and only needs the ability to demonstrate the behaviour to the robot. The experiments in this paper develop our approach further, their purpose is to show that even more complex behaviour behaviour of an agent — route learning in this case — can the “translated” directly into robot control code, namely by observing it, identifying it, using system identification, and using the identified model of the observed behaviour control the robot (Figure 1). Demonstrator’s Behaviour Behaviour is observed and logged Sensors NARMAX Motors Behaviour is modelled through NARMAX system identification Obtained model controls the robot Obtained model is analysed for stability, sensitivity to noise, etc. Figure 1: The “behaviour copier”: a behaviour is observed and subsequently identified using system identification. The obtained model is then used to control the robot directly, no human intervention is required at any point (other than that the human demonstrates the desired behaviour to the robot). In essence the experiments reported here form a “behaviour copier”, which produces a canonical “carbon copy” of an observed behaviour that can be used to control an autonomous mobile robot. 55
Approach We have used a NARMAX approach (Billings and Chen, 1998) to obtain the models we need for automated programming of a robot, because • The Narmax model itself provides the executable code straight away, • The model is analysable, and gives us valuable information regarding – How the robot achieves the task, – Whether the model is stable or not, – How the model will behave under certain operating conditions, and – How sensitive the model is to certain inputs, i.e. how “important” certain input are. NARMAX system identification The NARMAX modelling approach is a parameter estimation methodology for identifying both the important model terms and the parameters of unknown non-linear dynamic systems. For multiple input, single output noiseless systems this model takes the form given in equation 1: y(n) = f[⃗u(n − N u ) l , y(n − N y )], ∀ N u = 0 . . . Nu max , l = 1 . . . l max , N y = 1 . . . Ny max . (1) were y(n) and ⃗u(n) are the sampled output and input signals at time n respectively, N y and N u are the regression orders of the output and input respectively. The input vector ⃗u is d-dimensional, the output y is a scalar. f() is a non-linear function and it is typically taken to be a polynomial or wavelet multi-resolution expansion of the arguments. The degree l max of the polynomial is the highest sum of powers in any of its terms. The NARMAX methodology breaks the modelling problem into the following steps: i) Structure detection (i.e. determining the form of the non-linear polynomial), ii) parameter estimation (i.e. obtaining the model coefficients), iii) model validation, iv) prediction, and v) analysis. A detailed description of how these steps are done is presented in (Billings and Chen, 1998; Korenberg et al., 1988; Billings and Voon, 1986). The calculation of the NARMAX model parameters is an iterative process. Each iteration involves three steps: i) estimation of model parameters, ii) model validation and iii) removal of non-contributing terms. Using System Identification to Obtain Robot- Executable Narmax models It is difficult for a programmer to teach a particular task to a robot as humans and robots perceive and act in the world differently; humans and robots have different sensor and actuator modalities (Alissandrakis et al., 2005). We have adopted a similar approach to (Nehmzow et al., 2007) where the mobile robot’s trajectory of the desired behaviour is used as a suitable communication channel between the human and the robot. In this paper we present an approach to show how route learning can be translated directly into control code using system identification. The trajectory of the human is used as reference and this is translated algorithmically and automatically into robot control code. An outline this method is illustrated in Figure 2. Robot explores environment and logs laser perception and ground truth from logging system P(t)= f( x(t), y(t), sin ϕ(t), cos ϕ(t)) Obtain model of robot’s laser perception as a function of robot’s position and orientation Human walks desired trajectory His poses along the trajectory are logged ω (t)= f(P(t)) Obtain model of rotational velocity as a function of modelled perception Let model of rotational velocity drive robot Figure 2: System Identification Process used to obtain Robot-Executable Narmax models 1. Obtaining the sensorgraph: The robot explores the environment and obtains a detailed log of sensor perceptions throughout the working environment. This detailed log, the sensorgraph, contains information such as laser readings and the robot’s pose < x, y, ϕ >, i.e. its position < x, y > and heading ϕ within the environ- 56
Page 2 and 3:
Eric Baum, Marcus Hutter, Emanuel K
Page 4 and 5:
In Memoriam Ray Solomonoff (1926-20
Page 6:
Artificial General Intelligence Vol
Page 10 and 11:
Conference Organization Chairs Marc
Page 12 and 13:
Table of Contents Full Articles. Ef
Page 14:
Uncertain Spatiotemporal Logic for
Page 17 and 18:
inference. Constraint graphs compac
Page 19 and 20: s ← the rule system‟s opinion o
Page 21 and 22: Run Time (sec) Run Time (sec) probl
Page 23 and 24: ut also, more importantly, by the c
Page 25 and 26: pattern recognition only, while at
Page 27 and 28: Central would be a two-way interact
Page 29 and 30: set of the OpenCogPrime architectur
Page 31 and 32: mentioned elements to the real elem
Page 33 and 34: Suppose it has previously been show
Page 35 and 36: man reality; we have given a semi-f
Page 37 and 38: t∑ Vµ,g,T π ≡ E( r g (I g,s,i
Page 39 and 40: as we have formalized it here is sp
Page 41 and 42: s1 s2 s3 s4 s5 s6 s7 1 0 1 0 1 0 1
Page 43 and 44: valued for every τ and this value
Page 45 and 46: Environment Type General Bounded Ba
Page 47 and 48: The sliding window is passed over t
Page 49 and 50: data. However, TP alone performs ve
Page 51 and 52: Extension to Non-Symbolic Data Stri
Page 53 and 54: agent’s uncertain reasoning, than
Page 55 and 56: Theorem 2. Suppose that in addition
Page 57 and 58: List lnheritance $E $C Inheritance
Page 59 and 60: The Toy Box Problem As with existin
Page 61 and 62: the conceptual mismatch between the
Page 63 and 64: Initial 2D World State Impact in 2D
Page 65 and 66: R Rewriting Rule: a b a R b a b b
Page 67 and 68: ements connected by binary row and
Page 69: White uses E Black uses E Gomoku 78
Page 73 and 74: Range [cm] Range [cm] as well as th
Page 75 and 76: system with the computed rotational
Page 77 and 78: (a) (b) Figure 1: DCT network repre
Page 79 and 80: (a) Pole balancing (b) T-maze (c) B
Page 81 and 82: lated weights, i.e. requiring the f
Page 83 and 84: In this paper, we will discuss heur
Page 85 and 86: Consider again a substitution θ as
Page 87 and 88: (Ax S ) ∗∗ (Ax + j S S )∗∗
Page 89 and 90: size of the grid grows. Proposition
Page 91 and 92: Algorithm 2 Propagate Procedure Pro
Page 93 and 94: #relations MiniMaxSAT DPLL-S 5 0.9s
Page 95 and 96: is useful for designing the perform
Page 97 and 98: its knowledge is limited, and even
Page 99 and 100: RISC vs. CISC trade-offs in traditi
Page 101 and 102: Figure 1: Squares: algorithmic comp
Page 103 and 104: 10 5 0 −5 −10 20 40 60 80 100 1
Page 105 and 106: References [Bas06] A. J. Bastian. L
Page 107 and 108: Our algorithm incorporates gradient
Page 109 and 110: is off-policy λ-return and ¯φ t
Page 111 and 112: we can substitute δ t e t , based
Page 113 and 114: LP1 Sensing a world state world_sta
Page 115 and 116: given observed face was considered
Page 117 and 118: analysis (verification) as has been
Page 119 and 120: Core Modules Five core regions in t
Page 121 and 122:
agent. These modules receive instru
Page 123 and 124:
The image processing done to extrac
Page 125 and 126:
An agent-environment perception is
Page 127 and 128:
for only one type of the sub-events
Page 129 and 130:
where c ′ is the confidence of th
Page 131 and 132:
sirability of events, i.e. such tha
Page 133 and 134:
where r G , r P and r Q are the rew
Page 135 and 136:
The rest of the argument parallels
Page 137 and 138:
probability distribution Pr are onl
Page 139 and 140:
Figure 1: (a-b) Two causal networks
Page 141 and 142:
2.5 2.5 2 2 d(t) [bits] 1.5 1 d(t)
Page 143 and 144:
eak this clique and then learning i
Page 145 and 146:
The position of the image plane at
Page 147 and 148:
The next experiments are performed
Page 149 and 150:
was poorly aligned to human intelli
Page 151 and 152:
claim that the goals of AGI are out
Page 153 and 154:
feedback connections, pages 95-133.
Page 155 and 156:
A non-universal variant (WS96) is r
Page 157 and 158:
probability density 0.5 0.4 0.3 0.2
Page 159 and 160:
due to the fact that the encoding l
Page 161 and 162:
the “Four Big F’s”: Feeding,
Page 163 and 164:
A runtime-dependent performance mea
Page 165 and 166:
A. N. Kolmogorov. Three approaches
Page 167 and 168:
describable regularity in a batch o
Page 169 and 170:
start out with problems that are in
Page 171 and 172:
This CJS estimate makes it easy to
Page 173 and 174:
Frontier Search Sun Yi, Tobias Glas
Page 175 and 176:
program i execution time τ steps i
Page 177 and 178:
Example 12. Consider the criterion
Page 179 and 180:
The Evaluation of AGI Systems Pei W
Page 181 and 182:
telligence, the evaluation needs to
Page 183 and 184:
Now we see that the empirical appro
Page 185 and 186:
Designing a Safe Motivational Syste
Page 187 and 188:
non-problematic result than explora
Page 189 and 190:
architecture based upon Sloman’s
Page 191 and 192:
Software Design of an AGI System Ba
Page 193 and 194:
A Theoretical Framework to Formaliz
Page 195 and 196:
Uncertain Spatiotemporal Logic for
Page 197 and 198:
A (hopefully) Unbiased Universal En
Page 199 and 200:
Neuroethological Approach to Unders
Page 201 and 202:
Compression Progress, Pseudorandomn
Page 203 and 204:
Relational Local Iterative Compress
Page 205 and 206:
Stochastic Grammar Based Incrementa
Page 207 and 208:
Compression-Driven Progress in Scie
Page 209 and 210:
Concept Formation in the Ouroboros
Page 211 and 212:
On Super-Turing Computing Power and
Page 213 and 214:
A minimum relative entropy principl
Page 215:
Author Index Araujo, Samir . . . .
show all

Advances in Intelligent Systems Research - of Marcus Hutter

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?