10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.76 4 — The Source Coding Theorem(a)−6−4−2.4S 0S 116✻✻ ✻e,f,g,hd a,b,c−2✲ log 2 P (x)Figure 4.6. (a) The outcomes of X(from example 4.6 (p.75)), rankedby their probability. (b) Theessential bit content H δ (X). Thelabels on the graph show thesmallest sufficient set as afunction of δ. Note H 0 (X) = 3bits <strong>and</strong> H 1/16 (X) = 2 bits.H δ(X)32.52{a,b,c,d,e,f,g,h}{a,b,c,d,e,f,g}{a,b,c,d,e,f}{a,b,c,d,e}{a,b,c,d}1.5{a,b,c}1{a,b}(b)0.5{a}00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9δExtended ensemblesIs this compression method any more useful if we compress blocks of symbolsfrom a source?We now turn to examples where the outcome x = (x 1 , x 2 , . . . , x N ) is astring of N independent identically distributed r<strong>and</strong>om variables from a singleensemble X. We will denote by X N the ensemble (X 1 , X 2 , . . . , X N ). Rememberthat entropy is additive for independent variables (exercise 4.2 (p.68)), soH(X N ) = NH(X).Example 4.7. Consider a string of N flips of a bent coin, x = (x 1 , x 2 , . . . , x N ),where x n ∈ {0, 1}, with probabilities p 0 = 0.9, p 1 = 0.1. The most probablestrings x are those with most 0s. If r(x) is the number of 1s in xthenP (x) = p N−r(x)0 p r(x)1 . (4.20)To evaluate H δ (X N ) we must find the smallest sufficient subset S δ . Thissubset will contain all x with r(x) = 0, 1, 2, . . . , up to some r max (δ) − 1,<strong>and</strong> some of the x with r(x) = r max (δ). Figures 4.7 <strong>and</strong> 4.8 show graphsof H δ (X N ) against δ for the cases N = 4 <strong>and</strong> N = 10. The steps are thevalues of δ at which |S δ | changes by 1, <strong>and</strong> the cusps where the slope ofthe staircase changes are the points where r max changes by 1.Exercise 4.8. [2, p.86] What are the mathematical shapes of the curves betweenthe cusps?For the examples shown in figures 4.6–4.8, H δ (X N ) depends strongly onthe value of δ, so it might not seem a fundamental or useful definition ofinformation content. But we will consider what happens as N, the numberof independent variables in X N , increases. We will find the remarkable resultthat H δ (X N ) becomes almost independent of δ – <strong>and</strong> for all δ it is very closeto NH(X), where H(X) is the entropy of one of the r<strong>and</strong>om variables.Figure 4.9 illustrates this asymptotic tendency for the binary ensemble ofexample 4.7. As N increases,1N H δ(X N ) becomes an increasingly flat function,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!