10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.462 37 — Bayesian <strong>Inference</strong> <strong>and</strong> Sampling <strong>Theory</strong>How strongly does this data set favour H 1 over H 0 ?We answer this question by computing the evidence for each hypothesis.Let’s assume uniform priors over the unknown parameters of the models. Thefirst hypothesis H 0 : p A+ = p B+ has just one unknown parameter, let’s call itp.P (p | H 0 ) = 1 p ∈ (0, 1). (37.17)We’ll use the uniform prior over the two parameters of model H 1 that we usedbefore:P (p A+ , p B+ | H 1 ) = 1 p A+ ∈ (0, 1), p B+ ∈ (0, 1). (37.18)Now, the probability of the data D under model H 0 is the normalizing constantfrom the inference of p given D:∫P (D | H 0 ) = dp P (D | p)P (p | H 0 ) (37.19)∫= dp p(1 − p) × 1 (37.20)= 1/6. (37.21)The probability of the data D under model H 1 is given by a simple twodimensionalintegral:∫ ∫P (D | H 1 ) = dp A+ dp B+ P (D | p A+ , p B+ )P (p A+ , p B+ | H 1 ) (37.22)∫∫= dp A+ p A+ dp B+ (1 − p B+ ) (37.23)= 1/2 × 1/2 (37.24)= 1/4. (37.25)Thus the evidence ratio in favour of model H 1 , which asserts that the twoeffectivenesses are unequal, isP (D | H 1 )P (D | H 0 ) = 1/41/6 = 0.60.4 . (37.26)So if the prior probability over the two hypotheses was 50:50, the posteriorprobability is 60:40 in favour of H 1 .✷Is it not easy to get sensible answers to well-posed questions using Bayesianmethods?[The sampling theory answer to this question would involve the identicalsignificance test that was used in the preceding problem; that test would yielda ‘not significant’ result. I think it is greatly preferable to acknowledge whatis obvious to the intuition, namely that the data D do give weak evidence infavour of H 1 . Bayesian methods quantify how weak the evidence is.]37.2 Dependence of p-values on irrelevant informationIn an expensive laboratory, Dr. Bloggs tosses a coin labelled a <strong>and</strong> b twelvetimes <strong>and</strong> the outcome is the stringaaabaaaabaab,which contains three bs <strong>and</strong> nine as.What evidence do these data give that the coin is biased in favour of a?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!