Lecture Notes - Department of Mathematics and Statistics - Queen's ...

More documents

Recommendations

Info

10 CHAPTER 2. CONTROLLED MARKOV CHAINS 2.1.2 Classes of Control Policies Admissible Control Policies Let H 0 := X, H t = H t−1 × K for t = 1, 2, . . .. We let h t denote an element of H t , where h t = {x [0,t] , u [0,t−1] }. A deterministic admissible control policy Π is a sequence of functions {γ t } from H t → U; in this case u t = γ t (h t ). A randomized control policy is a sequence Π = {Π t , t ≥ 0} such that Π : H t → P(U) (with P(U) being the space of probability measures on U) such that Π t (u t ∈ U(x t )|h t ) = 1, ∀h t ∈ H t . Markov Control Policies A policy is randomized Markov if P Π x 0 (u t ∈ C|h t ) = Π t (u t ∈ C|x t ), C ∈ B(U). Hence, the control action only depends on the state and the time, and not the past history. If the control strategy is deterministic, that is if Π t (u t = f t (x t )|x t ) = 1. for some function f t , the control policy is said to be deterministic Markov. Stationary Control Policies A policy is randomized stationary if P Π x 0 (u t ∈ C|h t ) = Π(u t ∈ C|x t ), C ∈ B(U). Hence, the control action only depends on the state, and not the past history or on time. If the control strategy is deterministic, that is if Π(u t = f(x t )|x t ) = 1 for some function f, the control policy is said to be deterministic stationary. 2.2 Markov Chain Induced by a Markov Policy The following is an important result: Theorem 2.2.1 Let the control policy be randomized Markov. Then, the controlled Markov chain becomes a Markov chain in X, that is, the state process itself becomes a Markov chain: P Π x 0 (x t+1 ∈ B|x t , x t−1 , . . .,x 0 ) = Q Π t (x t+1 ∈ B|x t ), B ∈ B(X), t ≥ 1, P.a.s. Proof: Let us consider the case where U is countable. Let B ∈ B(X). It follows that, Px Π 0 (x t+1 ∈ B|x t , x t−1 , . . . , x 0 ) = ∑ Px Π 0 (x t+1 ∈ B, u t |x t , x t−1 , . . . , x 0 ) u t = ∑ Px Π 0 (x t+1 ∈ B|u t , x t , x t−1 , . . . , x 0 )Px Π 0 (u t |x t , x t−1 , . . .,x 0 ) u t = ∑ u t Q Π x 0 (x t+1 ∈ B|u t , x t )π t (u t |x t ) = ∑ u t Q Π x 0 (x t+1 ∈ B, u t |x t ) = Q Π t (x t+1 ∈ B|x t ) (2.3)
2.3. PERFORMANCE OF POLICIES 11 The essential issue here is that, the control only depends on x t , and since x t+1 depends stochastically only on x t and u t , the desired result follows. In the above, we used properties from total probability and conditioning. ⋄ We note that, if the applied control policy is not only Markov but also stationary, then the induced Markov chain in X is time-homogeneous, that is the stochastic kernel Q Π t (.|.) does not depend on time. 2.3 Performance of Policies Consider a Markov control problem with an objective given as the minimization of T∑ −1 J(ν 0 , Π) = Eν Π 0 [ c(x t , u t )] where ν 0 denotes the distribution on x 0 . If x 0 = x, we then write t=0 T∑ −1 T∑ −1 J(x 0 , Π) = Ex Π 0 [ c(x t , u t )] = E Π [ c(x t , u t )|x 0 ] Such a cost problem is known as a Finite Horizon Optimal Control problem. We will also study costs of the following form: t=0 t=0 J(ν 0 , Π) = 1 T∑ −1 T EΠ ν 0 [ c(x t , u t )] Such a problem is known as the Average Cost Optimal Control Problem. Finally, we will consider costs of the following form: t=0 T∑ −1 J(ν 0 , Π) = Eν Π 0 [ β t c(x t , u t )], for some β ∈ (0, 1). This is called a Discounted Optimal Control Problem. Let Π A denote the class of admissible policies, Π M denote the class of Markov policies, Π S denote the class of Stationary policies. These policies can be both randomized or deterministic. In a general setting, we have the following: inf J(ν 0 , Π) ≤ Π∈Π A since the set of policies is progressively shrinking t=0 inf J(ν 0 , Π) ≤ Π∈Π M Π S ⊂ Π M ⊂ Π A inf J(ν 0 , Π), Π∈Π S We will show, however, that for the optimal control of a Markov chain, under mild conditions, Markov policies are always optimal (that is there is no loss in optimality in restricting the policies to be Markov); that is, it is sufficient to consider Markov policies. This is an important result in stochastic control. That is, inf J(ν 0 , Π) = inf J(ν 0 , Π) Π∈Π A Π∈Π M We will also show that, under somewhat more restrictive conditions, Stationary policies are optimal (that is, there is no loss in optimality in restricting the policies to be Stationary). This will typically require T → ∞. inf J(ν 0 , Π) = inf J(ν 0 , Π) Π∈Π A Π∈Π S Furthermore, we will show that, under some conditions, inf J(ν 0 , Π) Π∈Π S
Page 1 and 2: i Queen’s University Mathematics
Page 3 and 4: Contents 1 Review of Probability 1
Page 5 and 6: CONTENTS v 5.1 Bellman’s Principl
Page 7 and 8: Chapter 1 Review of Probability 1.1
Page 9 and 10: 1.2. MEASURABLE SPACE 3 1.2.3 Measu
Page 11 and 12: 1.3. PROBABILITY SPACE AND RANDOM V
Page 13 and 14: 1.5. EXERCISES 7 where w t is an in
Page 15: Chapter 2 Controlled Markov Chains
Page 19 and 20: 2.4. EXERCISES 13 2.4 Exercises Exe
Page 21 and 22: Chapter 3 Classification of Markov
Page 23 and 24: 3.1. COUNTABLE STATE SPACE MARKOV C
Page 25 and 26: 3.2. STABILITY AND INVARIANT DISTRI
Page 27 and 28: 3.2. STABILITY AND INVARIANT DISTRI
Page 29 and 30: 3.3. UNCOUNTABLE (COMPLETE, SEPARAB
Page 31 and 32: 3.3. UNCOUNTABLE (COMPLETE, SEPARAB
Page 33 and 34: 3.4. FURTHER RESULTS ON THE EXISTEN
Page 35 and 36: 3.5. EXERCISES 29 3.5 Exercises Exe
Page 37 and 38: Chapter 4 Martingales and Foster-Ly
Page 39 and 40: 4.1. MARTINGALES 33 Theorem 4.1.3 I
Page 41 and 42: 4.1. MARTINGALES 35 Theorem 4.1.6 L
Page 43 and 44: 4.2. STABILITY OF MARKOV CHAINS: FO
Page 49 and 50: 4.3. CONVERGENCE RATES TO EQUILIBRI
Page 51 and 52: 4.4. CONCLUSION 45 The second condi
Page 53 and 54: 4.5. EXERCISES 47 ∀ bounded funct
Page 55 and 56: Chapter 5 Dynamic Programming In th
Page 57 and 58: 5.2. DISCUSSION: WHY ARE MARKOV POL
Page 59 and 60: 5.3. EXISTENCE OF MINIMIZING SELECT
Page 61 and 62: 5.4. INFINITE HORIZON OPTIMAL CONTR
Page 63 and 64: 5.4. INFINITE HORIZON OPTIMAL CONTR
Page 65 and 66: 5.6. EXERCISES 59 Theorem 5.5.1 The
Page 67 and 68:
5.6. EXERCISES 61 with R, Q, Q T >
Page 69 and 70:
Chapter 6 Partially Observed Markov
Page 71 and 72:
6.3. ESTIMATION AND KALMAN FILTERIN
Page 73 and 74:
6.3. ESTIMATION AND KALMAN FILTERIN
Page 75 and 76:
6.4. PARTIALLY OBSERVED MARKOV DECI
Page 77 and 78:
Chapter 7 The Average Cost Problem
Page 79 and 80:
7.3. LINEAR PROGRAMMING APPROACH TO
Page 81 and 82:
7.3. LINEAR PROGRAMMING APPROACH TO
Page 83 and 84:
7.4. DISCUSSION FOR MORE GENERAL ST
Page 85 and 86:
7.5. EXERCISES 79 7.4.2 Sample-Path
Page 87 and 88:
Chapter 8 Team Decision Theory and
Page 89 and 90:
8.1. EXERCISES 83 u b t = f b (E[x
Page 91 and 92:
Appendix A On the Convergence of Ra
Page 93 and 94:
A.3. CONVERGENCE OF RANDOM VARIABLE
Page 95 and 96:
Bibliography [1] A. Arapostathis, V
show all

Lecture Notes - Department of Mathematics and Statistics - Queen's ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?