Learning Motor Skills - Intelligent Autonomous Systems

More documents

Recommendations

Info

(a) The dart is placed on the launcher. (b) The arm moves back. (c) The arm moves forward on an arc. (d) The arm stops. (e) The dart is carried on by its momentum. (f) The dart hits the board. Figure 5.7: This figure shows a dart throw in a physically realistic simulation. (a) The dart is placed in the hand. (b) The arm moves back. (c) The arm moves forward on an arc. (d) The arm continues moving. (e) The dart is released and the arm follows through. (f) The arm stops and the dart hits the board. Figure 5.8: This figure shows a dart throw on the real JST-ICORP/SARCOS humanoid robot CBi. Compared to standard Gaussian process regression (GPR) in a supervised setting, CrKR can improve the policy over time according to a cost function and outperforms GPR in settings where different combinations of meta-parameters yield the same result. For details, see Figure 5.6. 5.3.2 Robot Dart-Throwing Games Now, we turn towards the complete framework, i.e., we intend to learn the meta-parameters for motor primitives in discrete movements. We compare the Cost-regularized Kernel Regression (CrKR) algorithm to the reward-weighted regression (RWR). As a sufficiently complex scenario, we chose a robot dart throwing task inspired by [Lawrence et al., 2003]. However, we take a more complicated scenario and choose dart games such as Around the Clock [Masters Games Ltd., 2010] instead of simple throwing at a fixed location. Hence, it will have an additional parameter in the state depending on the location on the dartboard that should come next in the sequence. The acquisition of a basic motor primitive is achieved using previous work on imitation learning [Ijspeert et al., 2002b]. Only the meta-parameter function is learned using CrKR or RWR. For the learning process, the targets (which are part of the state) are uniformly distributed on the dartboard. For the evaluation the targets are placed in the center of the fields. The reward is calculated based on the impact position observed by a vision system in the real robot experiments or the simulated impact position. The dart is placed on a launcher attached to the end-effector and held there by stiction. We use the Barrett WAM robot arm in order to achieve the high accelerations needed to overcome the stiction. See Figure 5.7, for a complete throwing movement. The motor primitive is trained by imitation learning with kinesthetic teach-in. We use the Cartesian coordinates with respect to the center of the dart board as input states. In comparison with the benchmark example, we cannot directly influence the release velocity in this setup. Hence, we employ the parameter for the final position g, the time scale of the motor primitive τ and the angle around the vertical axis (i.e., the orientation towards the dart board to which the robot moves before throwing) as meta-parameters instead. The popular dart game Around the Clock requires 80 5 Reinforcement <strong>Learning</strong> to Adjust Parametrized <strong>Motor</strong> Primitives to New Situations
1.4 1.2 Cost−regularized Kernel Regression Reward−weighted Regression average cost 1 0.8 0.6 0.4 0.2 0 200 400 600 800 1000 number of rollouts Figure 5.9: This figure shows the cost function of the dart-throwing task for a whole game Around the Clock in each rollout. The costs are averaged over 10 runs with the error-bars indicating standard deviation. The number of rollouts includes the rollouts not used to update the policy. (a) The dart is picked up. (b) The arm moves forward on an arc. (c) The arm continues moving. (d) The dart is released. (e) The arm follows through. (f) The arm continues moving. (g) The arm returns to the pick-up position. (h) The dart has hit the board. Figure 5.10: This figure shows a dart throw on the real Kuka KR 6 robot. the player to hit the numbers in ascending order, then the bulls-eye. As energy is lost overcoming the stiction of the launching sled, the darts fly lower and we placed the dartboard lower than official rules require. The cost function is defined as ∑ 2 c = 10 di − s i + τ, i∈{x,z} where d i are the horizontal and vertical positions of the dart on the dartboard after the throw, s i are the horizontal and vertical positions of the target corresponding to the state, and τ corresponds to the velocity of the motion. After approximately 1000 throws the algorithms have converged but CrKR yields a high performance already much earlier (see Figure 5.9). We again used a parametric policy with radial basis functions for RWR. Here, we employed 225 Gaussian basis function on a regular grid per meta-parameter. Designing a good parametric policy proved very difficult in this setting as is reflected by the poor performance of RWR. This experiment has also being carried out on three real, physical robots, i.e., a Barrett WAM, the humanoid robot CBi (JST-ICORP/SARCOS), and a Kuka KR 6. CBi was developed within the framework 5.3 Evaluations and Experiments 81
Page 1 and 2:
Learning Motor Skills: From Algorit
Page 3:
Erklärung zur Dissertation Hiermit
Page 6 and 7:
epresenting sub-tasks, can be combi
Page 8 and 9:
durch das Anpassen einer kleinen An
Page 11 and 12:
Contents Abstract Zusammenfassung A
Page 13:
6 Learning Prioritized Control of M
Page 16 and 17:
Policy search, also known as policy
Page 18 and 19:
Figure 1.1: This figure illustrates
Page 21 and 22:
2 Reinforcement Learning in Robotic
Page 23 and 24:
(a) OBELIX robot (b) Zebra Zero rob
Page 25 and 26:
the task’s performance. This prob
Page 27 and 28:
the policy is considered a conditio
Page 29 and 30:
POLICY SEARCH Approach Employed by.
Page 31 and 32:
SMART STATE-ACTION DISCRETIZATION A
Page 33 and 34:
Figure 2.3: Boston Dynamics LittleD
Page 35 and 36:
DEMONSTRATION Approach Employed by.
Page 37 and 38:
Benefits of Noise: A complex real-w
Page 39 and 40:
(a) Schematic drawings of the ball-
Page 41 and 42:
The policy converges to the maximum
Page 43: obot reinforcement learning tractab
Page 46 and 47: Figure 3.1: This figure illustrates
Page 48 and 49: This canonical system has the time
Page 50 and 51: 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 3 2
Page 52 and 53: with f ref containing the values of
Page 54 and 55: Figure 3.7: Generalization to vario
Page 56 and 57: We evaluate the algorithms derived
Page 58 and 59: 2007], such a derivation results in
Page 60 and 61: Algorithm 4.2 episodic Natural Acto
Page 62 and 63: Algorithm 4.3 episodic Reward Weigh
Page 64 and 65: Open Parameters DoF Rollouts Policy
Page 66 and 67: (a) minimum motor command (b) passi
Page 68 and 69: 1 average return 0.9 0.8 0.7 0.6 FD
Page 70 and 71: Figure 4.7: This figure shows the i
Page 72 and 73: Figure 4.10: This figure illustrate
Page 74 and 75: 1 0.8 average return 0.6 0.4 0.2 0
Page 76 and 77: updates in simulation (such as Dyna
Page 78 and 79: complex. We modeled the system in a
Page 80 and 81: and, thus, ∂ θ log π = σ −2
Page 83 and 84: 5 Reinforcement Learning to Adjust
Page 85 and 86: parameters for scaling the duration
Page 87 and 88: Figure 5.2: This figure illustrates
Page 89 and 90: egression solution w = (Φ T RΦ +
Page 91 and 92: 100 (a) Velocity 2 (b) Precision 3
Page 93: The setup is given as follows: A to
Page 97 and 98: (a) Left. (b) Half left. (c) Center
Page 99 and 100: 0.9 cost/success 0.7 0.5 0.3 Succes
Page 101 and 102: position and velocity of the arm, t
Page 103 and 104: Figure 5.21: This figure illustrate
Page 105: favor behaviors with a higher numbe
Page 108 and 109: are possible. This redundancy can b
Page 110 and 111: 6.2.1 Single Primitive Control Law
Page 112 and 113: (a) Exaggerated schematic drawing.
Page 114 and 115: Dominance Structure Number of Hits
Page 117 and 118: 7 Conclusion In this thesis, we hav
Page 119 and 120: Learning Motor Skills The presented
Page 121 and 122: Learning Layers Jointly For the bal
Page 123: Book Chapters J. Kober and J. Peter
Page 126: J. A. Bagnell and J. C. Schneider.
Page 129 and 130: H. Fässler, H. A. Beyer, and J. T.
Page 131 and 132: F. Kirchner. Q-learning of complex
Page 133 and 134: J. A. Martín H., J. de Lope, and D
Page 135 and 136: L. Peshkin. Reinforcement Learning
Page 137 and 138: F. Sehnke, C. Osendorfer, T. Rücks
Page 139: A. Ude, A. Gams, T. Asfour, and J.
Page 142 and 143: 3.2 In this figure, we convey the i
Page 144 and 145:
4.11 This figure shows schematic dr
Page 146 and 147:
5.17 This figure illustrates the si
Page 149:
List of Tables 2.1 This table illus
Page 152 and 153:
As symbols in this PhD thesis, the
show all

Learning Motor Skills - Intelligent Autonomous Systems

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?