12.07.2015 Views

Ding Wang 2012 Neurocomputing

Ding Wang 2012 Neurocomputing

Ding Wang 2012 Neurocomputing

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

20D. <strong>Wang</strong> et al. / <strong>Neurocomputing</strong> 78 (<strong>2012</strong>) 14–22network and action network with the structures 4–8–2, 2–8–1,2–8–2, respectively. The initial weights of the three networks areall set to be random in ½ 1; 1Š. It should be mentioned that themodel network should be trained first. We train the modelnetwork for 1000 steps using 500 data samples under the learningrate a m ¼ 0:1. After the training of the model network is completed,the weights are kept unchanged. Then, we train the criticnetwork and the action network for 20 iterations (i.e., fori ¼ 1; 2, ...,20) with each iteration of 2000 training steps to makesure the given error bound e ¼ 10 5 is reached. In the trainingprocess, the learning rate a c ¼ a a ¼ 0:05. The convergence processof the cost function of the iterative HDP algorithm is shown inFig. 2, for k¼0. We can see that the iterative cost functionsequence does converge to the optimal cost function quiterapidly, which indicates the effectiveness of the iterative HDPalgorithm. Therefore, we have 9V 19 ðe 0 Þ V 20 ðe 0 Þ9re, which meansthat the number of steps of the e-optimal control is K e ðe 0 Þ¼19.Besides, the e-optimal control law m n e ðe 0Þ for system (6) can alsobe obtained during the iteration process.Next, we compute the near-optimal tracking control law fororiginal system (1) using (35) and apply it to the controlled systemfor 40 time steps. The obtained state curves are shown inThe state trajectory and the reference trajectory21.510.50−0.5−10 5 10 15 20 25 30 35 40Time stepsFig. 4. The state trajectory x 2 and the reference trajectory r 2 .r 2x 22.11.5u p1The cost function21.91.81.71.61.51.40 5 10 15 20Iteration stepsFig. 2. The convergence process of the cost function.The tracking control trajectories10.50−0.5−1−1.50 5 10 15 20 25 30 35 40Time stepsFig. 5. The tracking control trajectories u p .u p2The state trajectory and the reference trajectory1.510.50−0.5−10 5 10 15 20 25 30 35 40Time stepsFig. 3. The state trajectory x 1 and the reference trajectory r 1 .r 1x 1The tracking error10.50−0.5−1−1.50 5 10 15 20 25 30 35 40Time stepsFig. 6. The tracking error e.e 1e 2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!