8 CHAPTER 1. REVIEW OF PROBABILITY For otherwise there would be a point x which were not in any equivalence class. But x is equivalent with itself at least. Furthermore, since A contains only one point from each equivalence class, the sets A ⊕ q are disjoint; for otherwise there would be two sets which could include a common point: A ⊕ q <strong>and</strong> A ⊕ q ′ would include a common point, leading to the result that the difference x − q = z <strong>and</strong> x − q ′ = z are both in A, a contradiction, since there should be at most one point which is in the same equivalence class as x − q = z. We expect the uniform distribution to be shift-invariant, therefore P(A) = P(A ⊕ q). But [0, 1] = ∪ q A ⊕ q. Since a countable sum <strong>of</strong> identical non-negative elements can either become ∞, or 0, the contradiction follows: We can’t associate a number with this set.
Chapter 2 Controlled Markov Chains In the following, we discuss controlled Markov models. 2.1 Controlled Markov Models Consider the following model. x t+1 = f(x t , u t , w t ), (2.1) where x t is a X-valued state variable, u t is U-valued control action variable, w t a W-valued an i.i.d noise process, <strong>and</strong> f is a measurable function. We assume that X, U, W are subsets <strong>of</strong> Polish spaces. The model above in (2.1) contains (see [8]) the class <strong>of</strong> all stochastic processes which satisfy the following for all Borel sets B ∈ B(X), t ≥ 0, <strong>and</strong> all realizations x [0,t] , u [0,t] : P(x t+1 ∈ B|x [0,t] = a [0,t] , u [0,t] = b [0,t] ) = T (x t+1 ∈ B|a t , b t ) (2.2) where T (·|x, u) is a stochastic kernel from X × U to X. A stochastic process which satisfies (2.2) is called a controlled Markov chain. 2.1.1 Fully Observed Markov Control Problem Model A Fully Observed Markov Control Problem is a five tuple where • X is the state space, a subset <strong>of</strong> a Polish space. • U is the action space, a subset <strong>of</strong> a Polish space. (X, U, {U(x), x ∈ X}, T, c), • K = {(x, u) : u ∈ U(x) ∈ B(U), x ∈ X} is the set <strong>of</strong> state, control pairs that are feasible. There might be different states where different control actions are possible. • T is a state transition kernel, that is T (A|x t , u t ) = P(x t+1 ∈ A|x t , u t ). • c : K → R is the cost. Consider for now that the objective to be minimized is given by: J(x 0 , Π) := E Π ν 0 [ ∑ T −1 t=0 c(x t, u t )], where ν 0 is the initial probability measure, that is x 0 ∼ ν 0 . The goal is to find a policy Π ∗ (to be defined next) such that J(x 0 , Π ∗ ) ≤ J(x 0 , Π), ∀Π, such that Π is an admissible policy to be defined below. Such a Π ∗ is an optimal policy. Here Π can also be called the strategy, or law. 9