10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.3.3: The bent coin <strong>and</strong> model comparison 53Model comparison as inferenceIn order to perform model comparison, we write down Bayes’ theorem again,but this time with a different argument on the left-h<strong>and</strong> side. We wish toknow how probable H 1 is given the data. By Bayes’ theorem,Similarly, the posterior probability of H 0 isP (H 1 | s, F ) = P (s | F, H 1)P (H 1 ). (3.17)P (s | F )P (H 0 | s, F ) = P (s | F, H 0)P (H 0 ). (3.18)P (s | F )The normalizing constant in both cases is P (s | F ), which is the total probabilityof getting the observed data. If H 1 <strong>and</strong> H 0 are the only models underconsideration, this probability is given by the sum rule:P (s | F ) = P (s | F, H 1 )P (H 1 ) + P (s | F, H 0 )P (H 0 ). (3.19)To evaluate the posterior probabilities of the hypotheses we need to assignvalues to the prior probabilities P (H 1 ) <strong>and</strong> P (H 0 ); in this case, we mightset these to 1/2 each. And we need to evaluate the data-dependent termsP (s | F, H 1 ) <strong>and</strong> P (s | F, H 0 ). We can give names to these quantities. Thequantity P (s | F, H 1 ) is a measure of how much the data favour H 1 , <strong>and</strong> wecall it the evidence for model H 1 . We already encountered this quantity inequation (3.10) where it appeared as the normalizing constant of the firstinference we made – the inference of p a given the data.How model comparison works: The evidence for a model isusually the normalizing constant of an earlier Bayesian inference.We evaluated the normalizing constant for model H 1 in (3.12). The evidencefor model H 0 is very simple because this model has no parameters toinfer. Defining p 0 to be 1/6, we haveP (s | F, H 0 ) = p Fa0 (1 − p 0) Fb . (3.20)Thus the posterior probability ratio of model H 1 to model H 0 isP (H 1 | s, F )P (H 0 | s, F )= P (s | F, H 1)P (H 1 )(3.21)P (s | F, H 0 )P (H 0 )/F a !F b !=p Fa0(F a + F b + 1)!(1 − p 0) Fb . (3.22)Some values of this posterior probability ratio are illustrated in table 3.5. Thefirst five lines illustrate that some outcomes favour one model, <strong>and</strong> some favourthe other. No outcome is completely incompatible with either model. Withsmall amounts of data (six tosses, say) it is typically not the case that one ofthe two models is overwhelmingly more probable than the other. But withmore data, the evidence against H 0 given by any data set with the ratio F a : F bdiffering from 1: 5 mounts up. You can’t predict in advance how much dataare needed to be pretty sure which theory is true. It depends what p a is.The simpler model, H 0 , since it has no adjustable parameters, is able tolose out by the biggest margin. The odds may be hundreds to one against it.The more complex model can never lose out by a large margin; there’s no dataset that is actually unlikely given model H 1 .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!