10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.98 5 — Symbol Codes5.4 How much can we compress?So, we can’t compress below the entropy. How close can we expect to get tothe entropy?Theorem 5.1 Source coding theorem for symbol codes. For an ensemble Xthere exists a prefix code C with expected length satisfyingH(X) ≤ L(C, X) < H(X) + 1. (5.19)Proof. We set the codelengths to integers slightly larger than the optimumlengths:l i = ⌈log 2 (1/p i )⌉ (5.20)where ⌈l ∗ ⌉ denotes the smallest integer greater than or equal to l ∗ . [Weare not asserting that the optimal code necessarily uses these lengths,we are simply choosing these lengths because we can use them to provethe theorem.]We check that there is a prefix code with these lengths by confirmingthat the Kraft inequality is satisfied.∑2 −l i= ∑ 2 −⌈log 2 (1/pi)⌉ ≤ ∑ 2 − log 2 (1/pi) = ∑ p i = 1. (5.21)iiiiThen we confirmL(C, X) = ∑ ip i ⌈log(1/p i )⌉ < ∑ iThe cost of using the wrong codelengthsp i (log(1/p i )+1) = H(X)+1. (5.22)If we use a code whose lengths are not equal to the optimal codelengths, theaverage message length will be larger than the entropy.If the true probabilities are {p i } <strong>and</strong> we use a complete code with lengthsl i , we can view those lengths as defining implicit probabilities q i = 2 −l i. Continuingfrom equation (5.14), the average length isL(C, X) = H(X) + ∑ i✷p i log p i /q i , (5.23)i.e., it exceeds the entropy by the relative entropy D KL (p||q) (as defined onp.34).5.5 Optimal source coding with symbol codes: Huffman codingGiven a set of probabilities P, how can we design an optimal prefix code?For example, what is the best symbol code for the English language ensembleshown in figure 5.3? When we say ‘optimal’, let’s assume our aim is tominimize the expected length L(C, X).xP (x)a 0.0575b 0.0128c 0.0263d 0.0285e 0.0913f 0.0173g 0.0133h 0.0313i 0.0599j 0.0006k 0.0084l 0.0335m 0.0235n 0.0596o 0.0689p 0.0192q 0.0008r 0.0508s 0.0567t 0.0706u 0.0334v 0.0069w 0.0119x 0.0073y 0.0164z 0.0007− 0.1928Figure 5.3. An ensemble in need ofa symbol code.How not to do itOne might try to roughly split the set A X in two, <strong>and</strong> continue bisecting thesubsets so as to define a binary tree from the root. This construction has theright spirit, as in the weighing problem, but it is not necessarily optimal; itachieves L(C, X) ≤ H(X) + 2.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!