10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.7 — Codes for Integers 133Unary code. An integer n is encoded by sending a string of n−1 0s followedby a 1.nc U (n)1 12 013 0014 00015 00001.45 000000000000000000000000000000000000000000001The unary code has length l U (n) = n.The unary code is the optimal code for integers if the probability distributionover n is p U (n) = 2 −n .Self-delimiting codesWe can use the unary code to encode the length of the binary encoding of n<strong>and</strong> make a self-delimiting code:Code C α . We send the unary code for l b (n), followed by the headless binaryrepresentation of n.c α (n) = c U [l b (n)]c B (n). (7.1)Table 7.1 shows the codes for some integers. The overlining indicatesthe division of each string into the parts c U [l b (n)] <strong>and</strong> c B (n). We mightequivalently view c α (n) as consisting of a string of (l b (n) − 1) zeroesfollowed by the st<strong>and</strong>ard binary representation of n, c b (n).The codeword c α (n) has length l α (n) = 2l b (n) − 1.The implicit probability distribution over n for the code C α is separableinto the product of a probability distribution over the length l,P (l) = 2 −l , (7.2)<strong>and</strong> a uniform distribution over integers having that length,{ 2−l+1lP (n | l) =b (n) = l0 otherwise.(7.3)Now, for the above code, the header that communicates the length alwaysoccupies the same number of bits as the st<strong>and</strong>ard binary representation ofthe integer (give or take one). If we are expecting to encounter large integers(large files) then this representation seems suboptimal, since it leads to all filesoccupying a size that is double their original uncoded size. Instead of usingthe unary code to encode the length l b (n), we could use C α .Code C β . We send the length l b (n) using C α , followed by the headless binaryrepresentation of n.c β (n) = c α [l b (n)]c B (n). (7.4)Iterating this procedure, we can define a sequence of codes.Code C γ .c γ (n) = c β [l b (n)]c B (n). (7.5)n c b(n) l b(n) c α(n)1 1 1 12 10 2 0103 11 2 0114 100 3 001005 101 3 001016 110 3 00110.45 101101 6 00000101101Table 7.1. C α .n c β(n) c γ(n)1 1 12 0100 010003 0101 010014 01100 0101005 01101 0101016 01110 010110.45 0011001101 0111001101Table 7.2. C β <strong>and</strong> C γ .Code C δ .c δ (n) = c γ [l b (n)]c B (n). (7.6)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!