10.07.2015 Views

Optimal Tracking Control for a Class of Nonlinear Discrete-Time ...

Optimal Tracking Control for a Class of Nonlinear Discrete-Time ...

Optimal Tracking Control for a Class of Nonlinear Discrete-Time ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1858 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011Fig. 1.J(e(k))v(k)Critic networkAction networku(k) = v(k) + u s(k)x(k+1) = f(x(k−σ 0),...,x(k−σ m))+g(x(k−σ 0),...,x(k−σ m))u(k)x(k+1)v(k)e(k)e(k)u s(k)e(k+1) = x(k+1) − η(k+1)Structure diagram <strong>of</strong> the algorithm.η(k+1) = s(η(k))η(k+1)IV. NEURAL NETWORK IMPLEMENTATION OF THEITERATION HDP ALGORITHMThe nonlinear optimal control solution relies on solving theHJB equation, exact solution <strong>of</strong> which is generally impossibleto be obtained <strong>for</strong> nonlinear time delay system. So we need touse parametric structures or neural networks to approximateboth control policy and per<strong>for</strong>mance index function. There<strong>for</strong>e,in order to implement the HDP iterations, we employ neuralnetworks <strong>for</strong> approximations in this section.Assume the number <strong>of</strong> hidden layer neurons is denoted by l,the weight matrix between the input layer and hidden layer isdenoted by V , the weight matrix between the hidden layer andoutput layer is denoted by W, then the output <strong>of</strong> three-layerneural network is represented by( )ˆF(X, V, W) = W T σ V T X(64)where σ(V T X) ∈ R l is the sigmoid function. The gradientdescent rule is adopted <strong>for</strong> the weight update rules <strong>of</strong> eachneural network [26].Here, there are two networks, which are critic network andaction network, respectively. Both neural networks are chosenas three-layer BP network. The whole structure diagram isshowninFig.1.A. Critic NetworkThe critic network is used to approximate the per<strong>for</strong>manceindex function J [i+1] (e(k)). The output <strong>of</strong> the critic networkis denoted as follows:( ) T ()Ĵ [i+1] (e(k)) = σ ) T e(k) . (65)w [i+1]c(v [i+1]cThe target function can be written as follows:J [i+1] (e(k)) = e T (k)Qe(k) + ( ˆv [i] (k) ) T R ˆv [i] (k)+Ĵ [i] (e(k + 1)). (66)Then we define the error function <strong>for</strong> the critic network asfollows:e c [i+1] (k) = Ĵ [i+1] (e(k)) − J [i+1] (e(k)). (67)The objective function to be minimized in the critic networkisE c [i+1] (k) = 1 (2e c (k)) [i+1] . (68)2So the gradient-based weights update rule <strong>for</strong> the criticnetwork is given bywherew c[i+2]v [i+2](k) = w c[i+1]c (k) = v c[i+1]w [i+1]cv [i+1]c(k) + w c[i+1] (k),(k) + v [i+1]c (k) (69)∂ E c(k) =−α [i+1] (k)c ,∂w c [i+1] (k)∂ E c(k) =−α [i+1]c∂v [i+1]c(k)(k)(70)and the learning rate α c <strong>of</strong> critic network is positive number.B. Action NetworkIn the action network, the states e(k − σ 0 ),...,e(k − σ m )are used as inputs to create the optimal control, ˆv [i] (k) asthe output <strong>of</strong> the network. The output can be <strong>for</strong>mulated asfollows:( ) T ( )ˆv [i] (k) = w a[i] σ (v a [i] )T Y (k) (71)where Y (k) =[e T (k − σ 0 ),...,e T (k − σ m )] T .We define the output error <strong>of</strong> the action network as follows:e a [i] (k) =ˆv[i] (k) − v [i] (k). (72)The weights in the action network are updated to minimizethe following per<strong>for</strong>mance error measure:E a [i] (k) = 1 ( ) 2e a [i]2(k) . (73)The weights updating algorithm is similar to the one <strong>for</strong> thecritic network. By the gradient descent rule, we can obtainw a[i+1] (k) = w a [i] (k) + w a [i] (k),v a [i+1] (k) = v a [i] (k) + v a [i] (k) (74)wherew [i]a (k) =−α av [i]a (k) =−α a∂ E a [i] (k)∂w a [i] (k) ,∂ E [i]a (k)∂v [i]a (k)(75)and the learning rate α a <strong>of</strong> action network is positive number.V. SIMULATION STUDYIn this section, two examples are provided to demonstratethe effectiveness <strong>of</strong> the proposed iteration HDP algorithm inthis paper. One is about tracking chaotic signal, and the otheris about the situation that the function g(·) is noninvertible.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!