10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.82 4 — The Source Coding TheoremChebyshev’s inequality 2. Let x be a r<strong>and</strong>om variable, <strong>and</strong> let α be apositive real number. ThenP ( (x − ¯x) 2 ≥ α ) ≤ σx 2 /α. (4.31)Proof: Take t = (x − ¯x) 2 <strong>and</strong> apply the previous proposition.Weak law of large numbers. Take x to be the average of N independentr<strong>and</strong>om variables h 1 , . . . , h N , having common mean ¯h <strong>and</strong> common varianceσ 2 h : x = 1 N∑ Nn=1 h n. ThenP ((x − ¯h) 2 ≥ α) ≤ σh 2 /αN. (4.32)Proof: obtained by showing that ¯x = ¯h <strong>and</strong> that σ 2 x = σ 2 h /N.We are interested in x being very close to the mean (α very small). No matterhow large σh 2 is, <strong>and</strong> no matter how small the required α is, <strong>and</strong> no matterhow small the desired probability that (x − ¯h) 2 ≥ α, we can always achieve itby taking N large enough.✷✷Proof of theorem 4.1 (p.78)We apply the law of large numbers to the r<strong>and</strong>om variable 1 N log 12 P (x) definedfor x drawn from the ensemble X N . This r<strong>and</strong>om variable can be written asthe average of N information contents h n = log 2 (1/P (x n )), each of which is ar<strong>and</strong>om variable with mean H = H(X) <strong>and</strong> variance σ 2 ≡ var[log 2 (1/P (x n ))].(Each term h n is the Shannon information content of the nth outcome.)We again define the typical set with parameters N <strong>and</strong> β thus:{ [ 1T Nβ = x ∈ A N X :N log 2For all x ∈ T Nβ , the probability of x satisfiesAnd by the law of large numbers,] }1 2P (x) − H < β 2 . (4.33)2 −N(H+β) < P (x) < 2 −N(H−β) . (4.34)P (x ∈ T Nβ ) ≥ 1 −σ2β 2 N . (4.35)We have thus proved the ‘asymptotic equipartition’ principle. As N increases,the probability that x falls in T Nβ approaches 1, for any β. How does thisresult relate to source coding?We must relate T Nβ to H δ (X N ). We will show that for any given δ thereis a sufficiently big N such that H δ (X N ) ≃ NH.Part 1:1N H δ(X N ) < H + ɛ.The set T Nβ is not the best subset for compression. So the size of T Nβ givesan upper bound on H δ . We show how small H δ (X N ) must be by calculatinghow big T Nβ could possibly be. We are free to set β to any convenient value.The smallest possible probability that a member of T Nβ can have is 2 −N(H+β) ,<strong>and</strong> the total probability contained by T Nβ can’t be any bigger than 1. Sothat is, the size of the typical set is bounded by|T Nβ | 2 −N(H+β) < 1, (4.36)|T Nβ | < 2 N(H+β) . (4.37)σIf we set β = ɛ <strong>and</strong> N 0 such that 2ɛ 2 N 0≤ δ, then P (T Nβ ) ≥ 1 − δ, <strong>and</strong> the setT Nβ becomes a witness to the fact that H δ (X N ) ≤ log 2 |T Nβ | < N(H + ɛ).1N Hδ(XN )H 0(X)H0 1 δH + ɛH − ɛFigure 4.13. Schematic illustrationof the two parts of the theorem.Given any δ <strong>and</strong> ɛ, we show that1for large enough N,N H δ(X N )lies (1) below the line H + ɛ <strong>and</strong>(2) above the line H − ɛ.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!