10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.64 3 — More about <strong>Inference</strong>can be increased a little. It is shown for several values of α in figure 3.11.Even the most favourable choice of α (α ≃ 50) can yield a likelihood ratio ofonly two to one in favour of H 1 .In conclusion, the data are not ‘very suspicious’. They can be construedas giving at most two-to-one evidence in favour of one or other of the twohypotheses.Are these wimpy likelihood ratios the fault of over-restrictive priors? Is thereany way of producing a ‘very suspicious’ conclusion? The prior that is bestmatchedto the data, in terms of likelihood, is the prior that sets p to f ≡140/250 with probability one. Let’s call this model H ∗ . The likelihood ratio isP (D|H ∗ )/P (D|H 0 ) = 2 250 f 140 (1 − f) 110 = 6.1. So the strongest evidence thatthese data can possibly muster against the hypothesis that there is no bias issix-to-one.While we are noticing the absurdly misleading answers that ‘sampling theory’statistics produces, such as the p-value of 7% in the exercise we just solved,let’s stick the boot in. If we make a tiny change to the data set, increasingthe number of heads in 250 tosses from 140 to 141, we find that the p-valuegoes below the mystical value of 0.05 (the p-value is 0.0497). The samplingtheory statistician would happily squeak ‘the probability of getting a result asextreme as 141 heads is smaller than 0.05 – we thus reject the null hypothesisat a significance level of 5%’. The correct answer is shown for several valuesof α in figure 3.12. The values worth highlighting from this table are, first,the likelihood ratio when H 1 uses the st<strong>and</strong>ard uniform prior, which is 1:0.61in favour of the null hypothesis H 0 . Second, the most favourable choice of α,from the point of view of H 1 , can only yield a likelihood ratio of about 2.3:1in favour of H 1 .Be warned! A p-value of 0.05 is often interpreted as implying that the oddsare stacked about twenty-to-one against the null hypothesis. But the truthin this case is that the evidence either slightly favours the null hypothesis, ordisfavours it by at most 2.3 to one, depending on the choice of prior.The p-values <strong>and</strong> ‘significance levels’ of classical statistics should be treatedwith extreme caution. Shun them! Here ends the sermon.αP (D|H 1, α)P (D|H 0).37 .251.0 .482.7 .827.4 1.320 1.855 1.9148 1.7403 1.31096 1.1Figure 3.11. Likelihood ratio forvarious choices of the priordistribution’s hyperparameter α.αP (D ′ |H 1, α)P (D ′ |H 0).37 .321.0 .612.7 1.07.4 1.620 2.255 2.3148 1.9403 1.41096 1.2Figure 3.12. Likelihood ratio forvarious choices of the priordistribution’s hyperparameter α,when the data are D ′ = 141 headsin 250 trials.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!