10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.5Symbol CodesIn this chapter, we discuss variable-length symbol codes, which encode onesource symbol at a time, instead of encoding huge strings of N source symbols.These codes are lossless: unlike the last chapter’s block codes, they areguaranteed to compress <strong>and</strong> decompress without any errors; but there is achance that the codes may sometimes produce encoded strings longer thanthe original source string.The idea is that we can achieve compression, on average, by assigningshorter encodings to the more probable outcomes <strong>and</strong> longer encodings to theless probable.The key issues are:What are the implications if a symbol code is lossless? If some codewordsare shortened, by how much do other codewords have to be lengthened?Making compression practical. How can we ensure that a symbol code iseasy to decode?Optimal symbol codes. How should we assign codelengths to achieve thebest compression, <strong>and</strong> what is the best achievable compression?We again verify the fundamental status of the Shannon information content<strong>and</strong> the entropy, proving:Source coding theorem (symbol codes). There exists a variable-lengthencoding C of an ensemble X such that the average length of an encodedsymbol, L(C, X), satisfies L(C, X) ∈ [H(X), H(X) + 1).The average length is equal to the entropy H(X) only if the codelengthfor each outcome is equal to its Shannon information content.We will also define a constructive procedure, the Huffman coding algorithm,that produces optimal symbol codes.Notation for alphabets. A N denotes the set of ordered N-tuples of elementsfrom the set A, i.e., all strings of length N. The symbol A + willdenote the set of all strings of finite length composed of elements fromthe set A.Example 5.1. {0, 1} 3 = {000, 001, 010, 011, 100, 101, 110, 111}.Example 5.2. {0, 1} + = {0, 1, 00, 01, 10, 11, 000, 001, . . .}.91

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!