Lecture Notes - Department of Mathematics and Statistics - Queen's ...
Lecture Notes - Department of Mathematics and Statistics - Queen's ...
Lecture Notes - Department of Mathematics and Statistics - Queen's ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
10 CHAPTER 2. CONTROLLED MARKOV CHAINS<br />
2.1.2 Classes <strong>of</strong> Control Policies<br />
Admissible Control Policies<br />
Let H 0 := X, H t = H t−1 × K for t = 1, 2, . . .. We let h t denote an element <strong>of</strong> H t , where h t = {x [0,t] , u [0,t−1] }.<br />
A deterministic admissible control policy Π is a sequence <strong>of</strong> functions {γ t } from H t → U; in this case<br />
u t = γ t (h t ). A r<strong>and</strong>omized control policy is a sequence Π = {Π t , t ≥ 0} such that Π : H t → P(U) (with P(U)<br />
being the space <strong>of</strong> probability measures on U) such that<br />
Π t (u t ∈ U(x t )|h t ) = 1, ∀h t ∈ H t .<br />
Markov Control Policies<br />
A policy is r<strong>and</strong>omized Markov if<br />
P Π x 0<br />
(u t ∈ C|h t ) = Π t (u t ∈ C|x t ),<br />
C ∈ B(U).<br />
Hence, the control action only depends on the state <strong>and</strong> the time, <strong>and</strong> not the past history. If the control strategy<br />
is deterministic, that is if<br />
Π t (u t = f t (x t )|x t ) = 1.<br />
for some function f t , the control policy is said to be deterministic Markov.<br />
Stationary Control Policies<br />
A policy is r<strong>and</strong>omized stationary if<br />
P Π x 0<br />
(u t ∈ C|h t ) = Π(u t ∈ C|x t ),<br />
C ∈ B(U).<br />
Hence, the control action only depends on the state, <strong>and</strong> not the past history or on time. If the control strategy is<br />
deterministic, that is if Π(u t = f(x t )|x t ) = 1 for some function f, the control policy is said to be deterministic<br />
stationary.<br />
2.2 Markov Chain Induced by a Markov Policy<br />
The following is an important result:<br />
Theorem 2.2.1 Let the control policy be r<strong>and</strong>omized Markov. Then, the controlled Markov chain becomes a<br />
Markov chain in X, that is, the state process itself becomes a Markov chain:<br />
P Π x 0<br />
(x t+1 ∈ B|x t , x t−1 , . . .,x 0 ) = Q Π t (x t+1 ∈ B|x t ),<br />
B ∈ B(X), t ≥ 1, P.a.s.<br />
Pro<strong>of</strong>: Let us consider the case where U is countable. Let B ∈ B(X). It follows that,<br />
Px Π 0<br />
(x t+1 ∈ B|x t , x t−1 , . . . , x 0 )<br />
= ∑ Px Π 0<br />
(x t+1 ∈ B, u t |x t , x t−1 , . . . , x 0 )<br />
u t<br />
= ∑ Px Π 0<br />
(x t+1 ∈ B|u t , x t , x t−1 , . . . , x 0 )Px Π 0<br />
(u t |x t , x t−1 , . . .,x 0 )<br />
u t<br />
= ∑ u t<br />
Q Π x 0<br />
(x t+1 ∈ B|u t , x t )π t (u t |x t )<br />
= ∑ u t<br />
Q Π x 0<br />
(x t+1 ∈ B, u t |x t )<br />
= Q Π t (x t+1 ∈ B|x t ) (2.3)