10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.80 4 — The Source Coding TheoremIf N = 1000 thenr ∼ 100 ± 10. (4.26)Notice that as N gets bigger, the probability distribution of r becomes moreconcentrated, in the sense that while the range of possible values of r growsas N, the st<strong>and</strong>ard deviation of r grows only as √ N. That r is most likely tofall in a small range of values implies that the outcome x is also most likely tofall in a corresponding small subset of outcomes that we will call the typicalset.Definition of the typical setLet us define typicality for an arbitrary ensemble X with alphabet A X . Ourdefinition of a typical string will involve the string’s probability. A long stringof N symbols will usually contain about p 1 N occurrences of the first symbol,p 2 N occurrences of the second, etc. Hence the probability of this string isroughlyP (x) typ = P (x 1 )P (x 2 )P (x 3 ) . . . P (x N ) ≃ p (p1N)1 p (p2N)2 . . . p (p IN)I(4.27)so that the information content of a typical string islog 21P (x) ≃ N ∑ ip i log 21p i= NH. (4.28)So the r<strong>and</strong>om variable log 21/ P (x), which is the information content of x, isvery likely to be close in value to NH. We build our definition of typicalityon this observation.We define the typical elements of A N Xto be those elements that have probabilityclose to 2 −NH . (Note that the typical set, unlike the smallest sufficientsubset, does not include the most probable elements of A N X , but we will showthat these most probable elements contribute negligible probability.)We introduce a parameter β that defines how close the probability has tobe to 2 −NH for an element to be ‘typical’. We call the set of typical elementsthe typical set, T Nβ :T Nβ ≡{x ∈ A N X : ∣ ∣∣∣ 1N log 2∣ }1 ∣∣∣P (x) − H < β . (4.29)We will show that whatever value of β we choose, the typical set containsalmost all the probability as N increases.This important result is sometimes called the ‘asymptotic equipartition’principle.‘Asymptotic equipartition’ principle. For an ensemble of N independentidentically distributed (i.i.d.) r<strong>and</strong>om variables X N ≡ (X 1 , X 2 , . . . , X N ),with N sufficiently large, the outcome x = (x 1 , x 2 , . . . , x N ) is almostcertain to belong to a subset of A N X having only 2NH(X) members, eachhaving probability ‘close to’ 2 −NH(X) .Notice that if H(X) < H 0 (X) then 2 NH(X) is a tiny fraction of the numberof possible outcomes |A N X | = |A X| N = 2 NH0(X) .The term equipartition is chosen to describe the idea that the members ofthe typical set have roughly equal probability. [This should not be taken tooliterally, hence my use of quotes around ‘asymptotic equipartition’; see page83.]A second meaning for equipartition, in thermal physics, is the idea that each1degree of freedom of a classical system has equal average energy,2kT . Thissecond meaning is not intended here.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!