10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.8Dependent R<strong>and</strong>om VariablesIn the last three chapters on data compression we concentrated on r<strong>and</strong>omvectors x coming from an extremely simple probability distribution, namelythe separable distribution in which each component x n is independent of theothers.In this chapter, we consider joint ensembles in which the r<strong>and</strong>om variablesare dependent. This material has two motivations. First, data from the realworld have interesting correlations, so to do data compression well, we needto know how to work with models that include dependences. Second, a noisychannel with input x <strong>and</strong> output y defines a joint ensemble in which x <strong>and</strong> y aredependent – if they were independent, it would be impossible to communicateover the channel – so communication over noisy channels (the topic of chapters9–11) is described in terms of the entropy of joint ensembles.8.1 More about entropyThis section gives definitions <strong>and</strong> exercises to do with entropy, carrying onfrom section 2.4.The joint entropy of X, Y is:H(X, Y ) =∑xy∈A X A YP (x, y) logEntropy is additive for independent r<strong>and</strong>om variables:1P (x, y) . (8.1)H(X, Y ) = H(X) + H(Y ) iff P (x, y) = P (x)P (y). (8.2)The conditional entropy of X given y = b k is the entropy of the probabilitydistribution P (x | y = b k ).H(X | y = b k ) ≡ ∑1P (x | y = b k ) logP (x | y = bx∈A k ) . (8.3)XThe conditional entropy of X given Y is the average, over y, of the conditionalentropy of X given y.⎡⎤H(X | Y ) ≡ ∑P (y) ⎣ ∑1P (x | y) log ⎦P (x | y)y∈A Y x∈A X∑1= P (x, y) logP (x | y) . (8.4)xy∈A X A YThis measures the average uncertainty that remains about x when y isknown.138

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!