Optimal Tracking Control for a Class of Nonlinear Discrete-Time ...

More documents

Recommendations

Info

1852 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011on HDP algorithm. In this paper, this open problem will beexplicitly figured out. First, the HJB equation for discrete timedelay system is derived which is based on state error andcontrol error. In order to solve this HJB equation, a noveliteration HDP algorithm containing state updating, controlpolicy iteration, and performance index function iteration isproposed. We also give the convergence proof for the noveliteration HDP algorithm. At last, two neural networks areused, the critic neural network and the action neural networkare devoted to approximate the performance index functionand the corresponding control policy, respectively. The maincontributions of this paper can be summarized as follows.1) It is the first time to solve the optimal tracking controlproblem for nonlinear systems with time delays usingHDP algorithm.2) For the novel HDP algorithm, in order to get the optimalstates, the state is also updated according to the controlpolicy.3) In the state update, we adopt “backward iteration.” Forthe (i + 1) th performance index function iteration, thestate at time step k +1 is updated according to the statesbefore time step k +1 inthei th iteration and the controlpolicy in the i th iteration.This paper is organized as follows. In Section II, wepresent the problem formulation. In Section III, the optimaltracking control scheme is developed based on iteration HDPalgorithm and the convergence proof is given. In Section IV,the neural network implementation for the tracking controlscheme is discussed. In Section V, two examples are given todemonstrate the effectiveness of the proposed control scheme.In Section VI, the conclusion is drawn.II. PROBLEM FORMULATIONConsider a class of discrete-time affine nonlinear systemswith time delaysx(k + 1) = f (x(k − σ 0 ), x(k − σ 1 ),...,x(k − σ m ))+ g(x(k − σ 0 ), x(k − σ 1 ),...,x(k − σ m ))u(k)x(k) = ε 1 (k), −σ m ≤ k ≤ 0 (1)where x(k−σ 0 ), x(k−σ 1 ),...,x(k−σ m ) ∈R n ,andx(k−σ 1 ),...,x(k − σ m ) are states of time delays. f (x(k − σ 0 ), x(k −σ 1 ),...,x(k − σ m )) ∈R n , g(x(k − σ 0 ), x(k − σ 1 ),...,x(k −σ m )) ∈R n×m and the input u(k) ∈R m . ε 1 (k) is the initialstate, σ i is the time delay, set 0 = σ 0
ZHANG et al.: OPTIMAL TRACKING CONTROL FOR A CLASS OF NONLINEAR DISCRETE-TIME SYSTEMS WITH TIME DELAYS BASED ON HDP 1853pseudoinverse of g can be obtained by the MATLAB functionG = pinv(g).2) Least Square Method [43]: As g(·)g −1 (·) = I, wehave(−1g −1 (·) = g (·)g(·)) T g T (·). (7)Introducing E(0, r 2 ) ∈R n×m into g(·), thenḡ(·) can beexpressed as follows:(ḡ(·) = g(·) + E 0, r 2) (8)where every element in E(0, r 2 ) is zero-mean Gaussian noise.So we can get(−1g −1 (·) = ḡ (·)ḡ(·)) T g T (·). (9)Then we sample K times, every time we take nm Gaussianpoints, so we have E i (0, r 2 ), i ∈{1, 2,...,K }, every elementin E i is a Gaussian sample point. Thus, we have G(·) =[ḡ 1 ,...,ḡ K ],whereḡ i = g + E i (0, r 2 ), i ∈{1, 2,...,K }.So we can get(−1g −1 (·) = G (·)G(·)) T g T (·). (10)3) Neural Network Method [44]: First, we let theback-propagation (BP) neural network be expressed asˆF(X, V, W) = W T σ(V T X), whereW and V are the weightsof the neural network and σ is the activation function. Second,we use the output ˆx(k + 1) of BP neural network toapproximate x(k +1). Thenwehave ˆx(k +1) = W T σ(V T X),where X =[x(k − σ 0 ),...,x(k − σ m ), u(k)]. We have knownthat (1) is an affine nonlinear system. So we can get g =(∂ ˆx(k + 1)/∂u). The equation g −1 = (∂u/∂ ˆx(k + 1)) can beestablished.So we can see that u s is existent and can be obtainedby (6). In this paper, we adopt Moore-Penrose pseudoinversetechnique to get g −1 (·) in simulation part.According to (2) and (3), we can easily obtain the followingsystem:e(k + 1) = f (e(k − σ 0 ) + η(k − σ 0 ),...,e(k − σ m )+ η(k − σ m )) + g(e(k − σ 0 ) + η(k − σ 0 ),...,e(k − σ m ) + η(k − σ m ))× (g −1 (η(k − σ 0 ),...,η(k − σ m ))× (S(η(k)) − f (η(k − σ 0 ),...,η(k − σ m )))+ v(k)) − S(η(k)),e(k) = ε(k), −σ m ≤ k ≤ 0 (11)where ε(k) = ε 1 (k) − ε 2 (k).So the aim for this paper is changed to get an optimalcontrol policy not only making (11) asymptotically stable butalso making the performance index function (4) minimal. Tosolve the optimal tracking problem in this paper, the followingdefinition and assumption are required.Definition 3 (Asymptotic Stability) [1]: An equilibrium statee = 0 for (11) is asymptotically stable if:1) it is stable, i.e., given any positive numbers k 0 and ɛ,there exists δ > 0, such that every solution of (11)satisfiesmax |e(k)| ≤ δ andk 0 ≤k≤k 0 +σ mmaxk 0 ≤k≤∞|e(k)| ≤ ɛ;2)for each k 0 > 0thereisaδ>0 such that every solutionof (11) satisfies max |e(k)| ≤ δ andk 0 ≤k≤k 0 +σ mlim e(k) = 0.k→∞Assumption 1: Given (11), for the infinite-time horizonproblem, there exists a control policy v(k), which satisfies:1) v(k) is continuous on ,ife(k−σ 0 ) = e(k−σ 1 ) =···=e(k − σ m ) = 0, then v(k) = 0;2) v(k) stabilizes (11);3) ∀e(−σ 0 ), e(−σ 1 ), e(−σ m ) ∈R n , J(e(0), v(0)) is finite.Actually, for nonlinear systems without time delays, if v(k)satisfies Assumption 1, then we can say that v(k) is anadmissible control. The definition of admissible control canbe seen in [36] and [45].In the following section we focus on the design of v(k).A. Derivation of the Iteration HDP AlgorithmIn (11), for time step k, J ∗ (e(k)) is used to denotethe optimal performance index function, i.e., J ∗ (e(k)) =inf J(e(k), v(k)),v(k) andv∗ (k) = arg inf J(e(k), v(k)).v(k) u∗ (k) =v ∗ (k) + u s (k) is the optimal control for (1). Let e ∗ (k) =x ∗ (k) − η(k), wherex ∗ (k) is used to denote the state underthe action of the optimal tracking control policy u ∗ (k).According to Bellman’s principle of optimality [1], J ∗ (e(k))should satisfy the following HJB equation:{J ∗ (e(k)) = inf e T (k)Qe(k) + v T (k)Rv(k)v(k)}+ J ∗ (e(k + 1))(12)the optimal controller v ∗ (k) should satisfy{v ∗ (k) = arg inf e T (k)Qe(k) + v T (k)Rv(k)v(k)}+ J ∗ (e(k + 1)) . (13)Here we define e ∗ (k) = x ∗ (k) − η(k) andx ∗ (k + 1) = f (x ∗ (k − σ 0 ),...,x ∗ (k − σ m ))+ g(x ∗ (k − σ 0 ),...,x ∗ (k − σ m ))× u ∗ (k), k = 0, 1, 2,...,x ∗ (k) = ε 1 (k) k =−σ m , −σ m−1 ,...,−σ 0 . (14)Then the HJB equation is written as follows:J ∗ (e ∗ (k)) = (e ∗ (k)) T Qe ∗ (k) + (v ∗ (k)) T Rv ∗ (k)+ J ∗ (e ∗ (k + 1)). (15)Remark 2: Of course, one can reduce (1) to a systemwithout time delay by defining a new (σ m + 1)n-dimensionalstate vector y(k) = (x(k), x(k − 1),...,x(k − σ m )). However,Chyung has pointed out that there are two major disadvantagesof this method in [46]. First, the resulting new system is a(σ m + 1)n-dimensional system increasing the dimension ofthe system by (σ m + 1) fold. Second, the new system maynot be controllable even if the original system is controllable.This causes the set of attainability to have an empty interiorwhich, in turn, introduces additional difficulties [46]–[48].
Page 1: IEEE TRANSACTIONS ON NEURAL NETWORK
Page 6: 1856 IEEE TRANSACTIONS ON NEURAL NE
Page 9 and 10: ZHANG et al.: OPTIMAL TRACKING CONT
Page 11 and 12: ZHANG et al.: OPTIMAL TRACKING CONT

Optimal Tracking Control for a Class of Nonlinear Discrete-Time ...

Create successful ePaper yourself

Delete template?

Save as template?