10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.142 8 — Dependent R<strong>and</strong>om Variables<strong>Inference</strong> <strong>and</strong> information measuresExercise 8.10. [2 ] The three cards.(a) One card is white on both faces; one is black on both faces; <strong>and</strong> oneis white on one side <strong>and</strong> black on the other. The three cards areshuffled <strong>and</strong> their orientations r<strong>and</strong>omized. One card is drawn <strong>and</strong>placed on the table. The upper face is black. What is the colour ofits lower face? (Solve the inference problem.)(b) Does seeing the top face convey information about the colour ofthe bottom face? Discuss the information contents <strong>and</strong> entropiesin this situation. Let the value of the upper face’s colour be u <strong>and</strong>the value of the lower face’s colour be l. Imagine that we drawa r<strong>and</strong>om card <strong>and</strong> learn both u <strong>and</strong> l. What is the entropy ofu, H(U)? What is the entropy of l, H(L)? What is the mutualinformation between U <strong>and</strong> L, I(U; L)?Entropies of Markov processes⊲ Exercise 8.11. [3 ] In the guessing game, we imagined predicting the next letterin a document starting from the beginning <strong>and</strong> working towards the end.Consider the task of predicting the reversed text, that is, predicting theletter that precedes those already known. Most people find this a hardertask. Assuming that we model the language using an N-gram model(which says the probability of the next character depends only on theN − 1 preceding characters), is there any difference between the averageinformation contents of the reversed language <strong>and</strong> the forward language?8.4 SolutionsSolution to exercise 8.2 (p.140). See exercise 8.6 (p.140) for an example whereH(X | y) exceeds H(X) (set y = 3).We can prove the inequality H(X | Y ) ≤ H(X) by turning the expressioninto a relative entropy (using Bayes’ theorem) <strong>and</strong> invoking Gibbs’ inequality(exercise 2.26 (p.37)):⎡⎤H(X | Y ) ≡ ∑P (y) ⎣ ∑1P (x | y) log ⎦P (x | y)y∈A Y x∈A X∑1= P (x, y) logP (x | y)xy∈A X A Y= ∑ xy= ∑ xP (x)P (y | x) log1P (x) logP (x) + ∑ xP (y)P (y | x)P (x)P (x) ∑ yP (y | x) log(8.16)(8.17)P (y)P (y | x) . (8.18)The last expression is a sum of relative entropies between the distributionsP (y | x) <strong>and</strong> P (y). SoH(X | Y ) ≤ H(X) + 0, (8.19)with equality only if P (y | x) = P (y) for all x <strong>and</strong> y (that is, only if X <strong>and</strong> Yare independent).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!