10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.52 3 — More about <strong>Inference</strong>an a or a b. [Predictions are always expressed as probabilities. So ‘predictingwhether the next character is an a’ is the same as computing the probabilitythat the next character is an a.]Assuming H 1 to be true, the posterior probability of p a , given a string sof length F that has counts {F a , F b }, is, by Bayes’ theorem,P (p a | s, F, H 1 ) = P (s | p a, F, H 1 )P (p a | H 1 ). (3.10)P (s | F, H 1 )The factor P (s | p a , F, H 1 ), which, as a function of p a , is known as the likelihoodfunction, was given in equation (3.8); the prior P (p a | H 1 ) was given inequation (3.9). Our inference of p a is thus:P (p a | s, F, H 1 ) = pFa a (1 − p a) FbP (s | F, H 1 ) . (3.11)The normalizing constant is given by the beta integralP (s | F, H 1 ) =∫ 10dp a p Faa (1 − p a ) Fb = Γ(F a + 1)Γ(F b + 1) F a !F b !=Γ(F a + F b + 2) (F a + F b + 1)! .(3.12)Exercise 3.5. [2, p.59] Sketch the posterior probability P (p a | s = aba, F = 3).What is the most probable value of p a (i.e., the value that maximizesthe posterior probability density)? What is the mean value of p a underthis distribution?Answer the same questions for the posterior probabilityP (p a | s = bbb, F = 3).From inferences to predictionsOur prediction about the next toss, the probability that the next toss is an a,is obtained by integrating over p a . This has the effect of taking into accountour uncertainty about p a when making predictions. By the sum rule,∫P (a | s, F ) = dp a P (a | p a )P (p a | s, F ). (3.13)The probability of an a given p a is simply p a , so∫p FaaP (a | s, F ) = dp a p (1 − p a) FbaP (s | F )∫p Fa+1a (1 − p a ) Fb= dp aP (s | F )[ ]/ [](Fa + 1)! F b ! F a ! F b !=(F a + F b + 2)! (F a + F b + 1)!which is known as Laplace’s rule.3.3 The bent coin <strong>and</strong> model comparison=(3.14)(3.15)F a + 1F a + F b + 2 , (3.16)Imagine that a scientist introduces another theory for our data. He assertsthat the source is not really a bent coin but is really a perfectly formed die withone face painted heads (‘a’) <strong>and</strong> the other five painted tails (‘b’). Thus theparameter p a , which in the original model, H 1 , could take any value between0 <strong>and</strong> 1, is according to the new hypothesis, H 0 , not a free parameter at all;rather, it is equal to 1/6. [This hypothesis is termed H 0 so that the suffix ofeach model indicates its number of free parameters.]How can we compare these two models in the light of data? We wish toinfer how probable H 1 is relative to H 0 .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!