Optimal Tracking Control for a Class of Nonlinear Discrete-Time ...
Optimal Tracking Control for a Class of Nonlinear Discrete-Time ...
Optimal Tracking Control for a Class of Nonlinear Discrete-Time ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1858 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011Fig. 1.J(e(k))v(k)Critic networkAction networku(k) = v(k) + u s(k)x(k+1) = f(x(k−σ 0),...,x(k−σ m))+g(x(k−σ 0),...,x(k−σ m))u(k)x(k+1)v(k)e(k)e(k)u s(k)e(k+1) = x(k+1) − η(k+1)Structure diagram <strong>of</strong> the algorithm.η(k+1) = s(η(k))η(k+1)IV. NEURAL NETWORK IMPLEMENTATION OF THEITERATION HDP ALGORITHMThe nonlinear optimal control solution relies on solving theHJB equation, exact solution <strong>of</strong> which is generally impossibleto be obtained <strong>for</strong> nonlinear time delay system. So we need touse parametric structures or neural networks to approximateboth control policy and per<strong>for</strong>mance index function. There<strong>for</strong>e,in order to implement the HDP iterations, we employ neuralnetworks <strong>for</strong> approximations in this section.Assume the number <strong>of</strong> hidden layer neurons is denoted by l,the weight matrix between the input layer and hidden layer isdenoted by V , the weight matrix between the hidden layer andoutput layer is denoted by W, then the output <strong>of</strong> three-layerneural network is represented by( )ˆF(X, V, W) = W T σ V T X(64)where σ(V T X) ∈ R l is the sigmoid function. The gradientdescent rule is adopted <strong>for</strong> the weight update rules <strong>of</strong> eachneural network [26].Here, there are two networks, which are critic network andaction network, respectively. Both neural networks are chosenas three-layer BP network. The whole structure diagram isshowninFig.1.A. Critic NetworkThe critic network is used to approximate the per<strong>for</strong>manceindex function J [i+1] (e(k)). The output <strong>of</strong> the critic networkis denoted as follows:( ) T ()Ĵ [i+1] (e(k)) = σ ) T e(k) . (65)w [i+1]c(v [i+1]cThe target function can be written as follows:J [i+1] (e(k)) = e T (k)Qe(k) + ( ˆv [i] (k) ) T R ˆv [i] (k)+Ĵ [i] (e(k + 1)). (66)Then we define the error function <strong>for</strong> the critic network asfollows:e c [i+1] (k) = Ĵ [i+1] (e(k)) − J [i+1] (e(k)). (67)The objective function to be minimized in the critic networkisE c [i+1] (k) = 1 (2e c (k)) [i+1] . (68)2So the gradient-based weights update rule <strong>for</strong> the criticnetwork is given bywherew c[i+2]v [i+2](k) = w c[i+1]c (k) = v c[i+1]w [i+1]cv [i+1]c(k) + w c[i+1] (k),(k) + v [i+1]c (k) (69)∂ E c(k) =−α [i+1] (k)c ,∂w c [i+1] (k)∂ E c(k) =−α [i+1]c∂v [i+1]c(k)(k)(70)and the learning rate α c <strong>of</strong> critic network is positive number.B. Action NetworkIn the action network, the states e(k − σ 0 ),...,e(k − σ m )are used as inputs to create the optimal control, ˆv [i] (k) asthe output <strong>of</strong> the network. The output can be <strong>for</strong>mulated asfollows:( ) T ( )ˆv [i] (k) = w a[i] σ (v a [i] )T Y (k) (71)where Y (k) =[e T (k − σ 0 ),...,e T (k − σ m )] T .We define the output error <strong>of</strong> the action network as follows:e a [i] (k) =ˆv[i] (k) − v [i] (k). (72)The weights in the action network are updated to minimizethe following per<strong>for</strong>mance error measure:E a [i] (k) = 1 ( ) 2e a [i]2(k) . (73)The weights updating algorithm is similar to the one <strong>for</strong> thecritic network. By the gradient descent rule, we can obtainw a[i+1] (k) = w a [i] (k) + w a [i] (k),v a [i+1] (k) = v a [i] (k) + v a [i] (k) (74)wherew [i]a (k) =−α av [i]a (k) =−α a∂ E a [i] (k)∂w a [i] (k) ,∂ E [i]a (k)∂v [i]a (k)(75)and the learning rate α a <strong>of</strong> action network is positive number.V. SIMULATION STUDYIn this section, two examples are provided to demonstratethe effectiveness <strong>of</strong> the proposed iteration HDP algorithm inthis paper. One is about tracking chaotic signal, and the otheris about the situation that the function g(·) is noninvertible.