12.07.2015 Views

s1 s2 s3 s4 s5 - of Marcus Hutter

s1 s2 s3 s4 s5 - of Marcus Hutter

s1 s2 s3 s4 s5 - of Marcus Hutter

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sample ComplexityAn algorithm A is (ɛ, δ)-correct with sample complexity N if for allM ∈ M := {(S, A, p, r, γ) : p transition probabilities},{ ∑∞}Pt=1 [V M ∗ (s t) − VM A(s 1:t) > ɛ] > N < δ# time-steps where A is not ɛ-optimal“The probability that I am ’badly’ suboptimal for morethan N time-steps is at most δ!”


Algorithm Sketch1: loop2: Compute empiric estimate, ˆp, <strong>of</strong> transition matrix p3: Compute confidence interval about ˆp4: Act according to the most optimistic plausible MDP5: end loop


Analysis◮ Key component is bounding |V AˆM (s 1:t ) − V A M (s 1:t )| < ɛ◮ Require each state to be visited sufficiently <strong>of</strong>ten◮ States that are expected to be visited <strong>of</strong>ten needed betterestimatesDiscounted future state distributionV AˆM (s t )−VM A (s t ) = γ ∑ s√= ∑ s√≤Bernstein’s, σ 2 (s) := Var V (s ′ |s) and L := log 1/δErrorw(s) (p s − ˆp s ) · ̂VL|S|w(s)σ 2 (s)m := n(s)/w(s)√≈ L|S| 2≤m≈ ≤ ∑ sw(s)∑s w(s)σ2 (s)L|S| 2Var ∑ ∞k=t γk−t r k ≤ 1(1−γ) 2m(1 − γ) 2Therefore n(s) ≈ w(s)L|S|2ɛ 2 visits to state s needed(1 − γ) 2√|S|σ 2 (s)Ln(s)


Analysisn ≈ w(s)L|S|2ɛ 2 (1 − γ) 2 H := 11 − γ log 1ɛ(1 − γ)◮ w(s) is the discounted future state distribution◮ Expect to visit s at least w(s) times within H time-stepsL|S| 2 H◮ Expect to “know” a state afterɛ 2 (1 − γ) 2 time-steps◮ Once we “know” all states we are optimal◮ Expect sample complexity bounded by |S|2 |A|Hɛ 2 (1 − γ) 2 log 1 δ◮ Analysis harder since w(s) changes over time and we wantresults with high probability


Questions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!