10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.4.6: Comments 83Part 2:1N H δ(X N ) > H − ɛ.Imagine that someone claims this second part is not so – that, for any N,the smallest δ-sufficient subset S δ is smaller than the above inequality wouldallow. We can make use of our typical set to show that they must be mistaken.Remember that we are free to set β to any value we choose. We will setβ = ɛ/2, so that our task is to prove that a subset S ′ having |S ′ | ≤ 2 N(H−2β)<strong>and</strong> achieving P (x ∈ S ′ ) ≥ 1 − δ cannot exist (for N greater than an N 0 thatwe will specify).So, let us consider the probability of falling in this rival smaller subset S ′ .The probability of the subset S ′ isP (x ∈ S ′ ) = P (x ∈ S ′ ∩T Nβ ) + P (x ∈ S ′ ∩T Nβ ), (4.38)where T Nβ denotes the complement {x ∉ T Nβ }. The maximum value ofthe first term is found if S ′ ∩ T Nβ contains 2 N(H−2β) outcomes all with themaximum probability, 2 −N(H−β) . The maximum value the second term canhave is P (x ∉ T Nβ ). So:✬✩T Nβ S ′✫✪❈❈❖ ❅■ ❅S❈′ ∩ T NβS ′ ∩ T NβP (x ∈ S ′ ) ≤ 2 N(H−2β) 2 −N(H−β) +σ2β 2 N = 2−Nβ +σ2β 2 N . (4.39)We can now set β = ɛ/2 <strong>and</strong> N 0 such that P (x ∈ S ′ ) < 1 − δ, which showsthat S ′ cannot satisfy the definition of a sufficient subset S δ . Thus any subsetS ′ with size |S ′ | ≤ 2 N(H−ɛ) has probability less than 1 − δ, so by the definitionof H δ , H δ (X N ) > N(H − ɛ).Thus for large enough N, the function 1 N H δ(X N ) is essentially a constantfunction of δ, for 0 < δ < 1, as illustrated in figures 4.9 <strong>and</strong> 4.13. ✷4.6 Comments1The source coding theorem (p.78) has two parts,N H δ(X N ) < H + ɛ, <strong>and</strong>1N H δ(X N ) > H − ɛ. Both results are interesting.The first part tells us that even if the probability of error δ is extremely1small, the number of bits per symbolN H δ(X N ) needed to specify a longN-symbol string x with vanishingly small error probability does not have toexceed H + ɛ bits. We need to have only a tiny tolerance for error, <strong>and</strong> thenumber of bits required drops significantly from H 0 (X) to (H + ɛ).What happens if we are yet more tolerant to compression errors? Part 2tells us that even if δ is very close to 1, so that errors are made most of thetime, the average number of bits per symbol needed to specify x must still beat least H − ɛ bits. These two extremes tell us that regardless of our specificallowance for error, the number of bits per symbol needed to specify x is Hbits; no more <strong>and</strong> no less.Caveat regarding ‘asymptotic equipartition’I put the words ‘asymptotic equipartition’ in quotes because it is importantnot to think that the elements of the typical set T Nβ really do have roughlythe same probability as each other. They are similar in probability only in1the sense that their values of log 2 P (x)are within 2Nβ of each other. Now, asβ is decreased, how does N have to increase, if we are to keep our bound onthe mass of the typical set, P (x ∈ T Nβ ) ≥ 1 − σ2 , constant? N must growβ 2 Nas 1/β 2 , so, if we write β in terms of N as α/ √ N, for some constant α, then

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!