12.07.2015 Views

Ding Wang 2012 Neurocomputing

Ding Wang 2012 Neurocomputing

Ding Wang 2012 Neurocomputing

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

18D. <strong>Wang</strong> et al. / <strong>Neurocomputing</strong> 78 (<strong>2012</strong>) 14–22optimal control law u n as i-1, i.e.,limv i ðe k Þ¼u n ðe k Þ:i-13.3. The e-optimal control algorithmAccording to Theorems 1–3 and Corollary 1, we should run theiterative ADP algorithm (14)–(17) until i-1 to obtain theoptimal cost function J n ðe k Þ, and then to get a control vectorv 1 ðe k Þ based on which we can construct a control sequenceu 1ðe k Þ¼ðv 1 ðe k Þ,v 1 ðe k þ 1 Þ, ...,v 1 ðe k þ i Þ, ...Þ to control the state toreach the target. Obviously, u 1ðe k Þ has infinite length. Though it isfeasible in terms of theory, it is always not practical to do sobecause most real world systems need to be effectively controlledwithin finite-horizon. Therefore, in this section, we will propose anovel e-optimal control strategy using the iterative ADP algorithmto deal with the problem. The idea is, for a given error bounde40, the iterative number i will be chosen so that the errorbetween V i ðe k Þ and J n ðe k Þ is within the bound.Let e40 be any small number, e k be any controllable state, andJ n ðe k Þ be the optimal value of the cost function sequence definedas in (17). From Theorem 3, it is clear that there exists a finite isuch that9V i ðe k Þ J n ðe k Þ9re: ð31ÞThe length of the optimal control sequence starting from e k withrespect to e is defined asK e ðe k Þ¼minfi: 9V i ðe k Þ J n ðe k Þ9reg: ð32ÞThe corresponding control lawv i 1 ðe k Þ¼arg minfUðe k ,u k ÞþV i 1 ðe k þ 1 Þgu k¼ 1 2 R 1 g T ðe k þr k Þ @V i 1ðe k þ 1 Þð33Þ@e k þ 1is called the e-optimal control and is denoted as m n e ðe kÞ.In this sense, we can see that an error e between V i ðe k Þ andJ n ðe k Þ is introduced into the iterative ADP algorithm, which makesthe cost function sequence fV i ðe k Þg converge in finite number ofiteration steps.However, the optimal criterion (31) is difficult to verifybecause the optimal cost function J n ðe k Þ is unknown in general.Consequently, we will use an equivalent criterion, i.e.,9V i ðe k Þ V i þ 1 ðe k Þ9re ð34Þto replace (31).In fact, if 9V i ðe k Þ J n ðe k Þ9re holds, we have V i ðe k ÞrJ n ðe k Þþe.Combining with J n ðe k ÞrV i þ 1 ðe k ÞrV i ðe k Þ, we can find that0rV i ðe k Þ V i þ 1 ðe k Þre,which means9V i ðe k Þ V i þ 1 ðe k Þ9re:On the other hand, according to Theorem 3, 9V i ðe k Þ V i þ 1ðe k Þ9-0 connotes that V i ðe k Þ-J n ðe k Þ. As a result, if 9V i ðe k ÞV i þ 1 ðe k Þ9re holds for any given small e, we can derive theconclusion that 9V i ðe k Þ J n ðe k Þ9re holds if i is sufficiently large.3.4. Design procedure of the finite-horizon optimal tracking controlscheme using iterative ADP algorithmIn this section, we will give the detailed design procedure forthe finite-horizon nonlinear optimal tracking control schemeusing the iterative ADP algorithm.Step 1 Specify an error bound e for the given initial state x 0 . Choosei max , the reference trajectory r k ,andthematricesQ and R.Step 2 Compute e k according to (2) and (3).Step 3 Set i¼0, V 0 ðe k Þ¼0. Obtain the initial finite-horizon admissiblevector v 0 ðe k Þ by (14) and update the cost functionV 1 ðe k Þ by (15).Step 4 Set i ¼ iþ1.Step 5 Compute v i ðe k Þ by (16) and the corresponding cost functionV i þ 1 ðe k Þ by (17).Step 6 If 9V i ðe k Þ V i þ 1 ðe k Þ9re, then go to Step 8; otherwise, go toStep 7.Step 7 If i4i max , then go to Step 8; otherwise, go to Step 4.Step 8 Stop.After the optimal control law u n ðe k Þ for system (6) is derivedunder the given error bound e, we can compute the optimaltracking control input for original system (1) byu pk ¼ u ðe k Þþu dk ¼ u ðe k Þþg 1 ðr k Þðfðr k Þ f ðr k ÞÞ: ð35ÞIn the following section, we will describe the implementationof the iterative ADP algorithm based on NNs in detail.4. NN implementation of the iterative ADP algorithmvia HDP techniqueNow, we implement the iterative HDP algorithm in (14)–(17)using NNs. In the iterative HDP algorithm, there are three networks,which are model network, critic network and action network.All the networks are chosen as three-layer feedforward NNs.The inputs of the critic network and the action network are e k ,while the inputs of the model network are e k and ^v i ðe k Þ. Thestructure diagram of the iterative HDP algorithm is shown in Fig. 1.4.1. The model networkThe purpose of designing the model network is to approximatethe error d ynamics. We should train the model network beforecarrying out the iterative HDP algorithm. For given e k and ^v i ðe k Þ,we can obtain the output of the model network as^e k þ 1 ¼ o T msðn T m z kÞ,wherez k ¼½e T k ^vT i ðe kÞŠ T :We define the error function of the model network asð36Þe mk ¼ ^e k þ 1 e k þ 1 : ð37ÞThe weights in the model network are updated to minimize thefollowing performance measure:E mk ¼ 1 2 eT mk e mk:ð38ÞUsing the gradient-based adaptation rule, the weights can beupdated as @Eo m ðjþ1Þ¼o m ðjÞ a mkm , ð39Þ@o m ðjÞ @En m ðjþ1Þ¼n m ðjÞ a mkm , ð40Þ@n m ðjÞwhere a m 40 is the learning rate of the model network, and j isthe iterative step for updating the weight parameters.After the model network is trained, its weights are keptunchanged.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!