Lecture Notes - Department of Mathematics and Statistics - Queen's ...

More documents

Recommendations

Info

8 CHAPTER 1. REVIEW OF PROBABILITY For otherwise there would be a point x which were not in any equivalence class. But x is equivalent with itself at least. Furthermore, since A contains only one point from each equivalence class, the sets A ⊕ q are disjoint; for otherwise there would be two sets which could include a common point: A ⊕ q and A ⊕ q ′ would include a common point, leading to the result that the difference x − q = z and x − q ′ = z are both in A, a contradiction, since there should be at most one point which is in the same equivalence class as x − q = z. We expect the uniform distribution to be shift-invariant, therefore P(A) = P(A ⊕ q). But [0, 1] = ∪ q A ⊕ q. Since a countable sum of identical non-negative elements can either become ∞, or 0, the contradiction follows: We can’t associate a number with this set.
Chapter 2 Controlled Markov Chains In the following, we discuss controlled Markov models. 2.1 Controlled Markov Models Consider the following model. x t+1 = f(x t , u t , w t ), (2.1) where x t is a X-valued state variable, u t is U-valued control action variable, w t a W-valued an i.i.d noise process, and f is a measurable function. We assume that X, U, W are subsets of Polish spaces. The model above in (2.1) contains (see [8]) the class of all stochastic processes which satisfy the following for all Borel sets B ∈ B(X), t ≥ 0, and all realizations x [0,t] , u [0,t] : P(x t+1 ∈ B|x [0,t] = a [0,t] , u [0,t] = b [0,t] ) = T (x t+1 ∈ B|a t , b t ) (2.2) where T (·|x, u) is a stochastic kernel from X × U to X. A stochastic process which satisfies (2.2) is called a controlled Markov chain. 2.1.1 Fully Observed Markov Control Problem Model A Fully Observed Markov Control Problem is a five tuple where • X is the state space, a subset of a Polish space. • U is the action space, a subset of a Polish space. (X, U, {U(x), x ∈ X}, T, c), • K = {(x, u) : u ∈ U(x) ∈ B(U), x ∈ X} is the set of state, control pairs that are feasible. There might be different states where different control actions are possible. • T is a state transition kernel, that is T (A|x t , u t ) = P(x t+1 ∈ A|x t , u t ). • c : K → R is the cost. Consider for now that the objective to be minimized is given by: J(x 0 , Π) := E Π ν 0 [ ∑ T −1 t=0 c(x t, u t )], where ν 0 is the initial probability measure, that is x 0 ∼ ν 0 . The goal is to find a policy Π ∗ (to be defined next) such that J(x 0 , Π ∗ ) ≤ J(x 0 , Π), ∀Π, such that Π is an admissible policy to be defined below. Such a Π ∗ is an optimal policy. Here Π can also be called the strategy, or law. 9
Page 1 and 2: i Queen’s University Mathematics
Page 3 and 4: Contents 1 Review of Probability 1
Page 5 and 6: CONTENTS v 5.1 Bellman’s Principl
Page 7 and 8: Chapter 1 Review of Probability 1.1
Page 9 and 10: 1.2. MEASURABLE SPACE 3 1.2.3 Measu
Page 11 and 12: 1.3. PROBABILITY SPACE AND RANDOM V
Page 13: 1.5. EXERCISES 7 where w t is an in
Page 17 and 18: 2.3. PERFORMANCE OF POLICIES 11 The
Page 19 and 20: 2.4. EXERCISES 13 2.4 Exercises Exe
Page 21 and 22: Chapter 3 Classification of Markov
Page 23 and 24: 3.1. COUNTABLE STATE SPACE MARKOV C
Page 25 and 26: 3.2. STABILITY AND INVARIANT DISTRI
Page 27 and 28: 3.2. STABILITY AND INVARIANT DISTRI
Page 29 and 30: 3.3. UNCOUNTABLE (COMPLETE, SEPARAB
Page 31 and 32: 3.3. UNCOUNTABLE (COMPLETE, SEPARAB
Page 33 and 34: 3.4. FURTHER RESULTS ON THE EXISTEN
Page 35 and 36: 3.5. EXERCISES 29 3.5 Exercises Exe
Page 37 and 38: Chapter 4 Martingales and Foster-Ly
Page 39 and 40: 4.1. MARTINGALES 33 Theorem 4.1.3 I
Page 41 and 42: 4.1. MARTINGALES 35 Theorem 4.1.6 L
Page 43 and 44: 4.2. STABILITY OF MARKOV CHAINS: FO
Page 49 and 50: 4.3. CONVERGENCE RATES TO EQUILIBRI
Page 51 and 52: 4.4. CONCLUSION 45 The second condi
Page 53 and 54: 4.5. EXERCISES 47 ∀ bounded funct
Page 55 and 56: Chapter 5 Dynamic Programming In th
Page 57 and 58: 5.2. DISCUSSION: WHY ARE MARKOV POL
Page 59 and 60: 5.3. EXISTENCE OF MINIMIZING SELECT
Page 61 and 62: 5.4. INFINITE HORIZON OPTIMAL CONTR
Page 63 and 64: 5.4. INFINITE HORIZON OPTIMAL CONTR
Page 65 and 66:
5.6. EXERCISES 59 Theorem 5.5.1 The
Page 67 and 68:
5.6. EXERCISES 61 with R, Q, Q T >
Page 69 and 70:
Chapter 6 Partially Observed Markov
Page 71 and 72:
6.3. ESTIMATION AND KALMAN FILTERIN
Page 73 and 74:
6.3. ESTIMATION AND KALMAN FILTERIN
Page 75 and 76:
6.4. PARTIALLY OBSERVED MARKOV DECI
Page 77 and 78:
Chapter 7 The Average Cost Problem
Page 79 and 80:
7.3. LINEAR PROGRAMMING APPROACH TO
Page 81 and 82:
7.3. LINEAR PROGRAMMING APPROACH TO
Page 83 and 84:
7.4. DISCUSSION FOR MORE GENERAL ST
Page 85 and 86:
7.5. EXERCISES 79 7.4.2 Sample-Path
Page 87 and 88:
Chapter 8 Team Decision Theory and
Page 89 and 90:
8.1. EXERCISES 83 u b t = f b (E[x
Page 91 and 92:
Appendix A On the Convergence of Ra
Page 93 and 94:
A.3. CONVERGENCE OF RANDOM VARIABLE
Page 95 and 96:
Bibliography [1] A. Arapostathis, V
show all

Lecture Notes - Department of Mathematics and Statistics - Queen's ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?