10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.36 2 — Probability, Entropy, <strong>and</strong> <strong>Inference</strong>Convexity <strong>and</strong> concavity also relate to maximizationIf f(x) is concave ⌢ <strong>and</strong> there exists a point at which∂f∂x k= 0 for all k, (2.49)then f(x) has its maximum value at that point.The converse does not hold: if a concave ⌢ f(x) is maximized at some xit is not necessarily true that the gradient ∇f(x) is equal to zero there. Forexample, f(x) = −|x| is maximized at x = 0 where its derivative is undefined;<strong>and</strong> f(p) = log(p), for a probability p ∈ (0, 1), is maximized on the boundaryof the range, at p = 1, where the gradient df(p)/dp = 1.2.8 ExercisesSums of r<strong>and</strong>om variables[3, p.41]Exercise 2.16. (a) Two ordinary dice with faces labelled 1, . . . , 6 arethrown. What is the probability distribution of the sum of the values?What is the probability distribution of the absolute differencebetween the values?(b) One hundred ordinary dice are thrown. What, roughly, is the probabilitydistribution of the sum of the values? Sketch the probabilitydistribution <strong>and</strong> estimate its mean <strong>and</strong> st<strong>and</strong>ard deviation.(c) How can two cubical dice be labelled using the numbers{0, 1, 2, 3, 4, 5, 6} so that when the two dice are thrown the sumhas a uniform probability distribution over the integers 1–12?(d) Is there any way that one hundred dice could be labelled with integerssuch that the probability distribution of the sum is uniform?<strong>Inference</strong> problemsThis exercise is intended to helpyou think about the central-limittheorem, which says that ifindependent r<strong>and</strong>om variablesx 1 , x 2 , . . . , x N have means µ n <strong>and</strong>finite variances σ 2 n, then, in thelimit of large N, the sum ∑ n x nhas a distribution that tends to anormal (Gaussian) distributionwith mean ∑ n µ n <strong>and</strong> variance∑n σ2 n .Exercise 2.17. [2, p.41] If q = 1 − p <strong>and</strong> a = ln p/q, show thatp =11 + exp(−a) . (2.50)Sketch this function <strong>and</strong> find its relationship to the hyperbolic tangentfunction tanh(u) = eu −e −ue u +e −u .It will be useful to be fluent in base-2 logarithms also. If b = log 2 p/q,what is p as a function of b?⊲ Exercise 2.18. [2, p.42] Let x <strong>and</strong> y be dependent r<strong>and</strong>om variables with x abinary variable taking values in A X = {0, 1}. Use Bayes’ theorem toshow that the log posterior probability ratio for x given y islogP (x = 1 | y) P (y | x = 1) P (x = 1)= log + logP (x = 0 | y) P (y | x = 0) P (x = 0) . (2.51)⊲ Exercise 2.19. [2, p.42] Let x, d 1 <strong>and</strong> d 2 be r<strong>and</strong>om variables such that d 1 <strong>and</strong>d 2 are conditionally independent given a binary variable x. Use Bayes’theorem to show that the posterior probability ratio for x given {d i } isP (x = 1 | {d i })P (x = 0 | {d i }) = P (d 1 | x = 1) P (d 2 | x = 1) P (x = 1)P (d 1 | x = 0) P (d 2 | x = 0) P (x = 0) . (2.52)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!