10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.35.3: Inferring causation 447Given M separate experiments, in each of which a colony of size N iscreated, <strong>and</strong> where the measured numbers of resistant bacteria are {n m } M m=1 ,what can we infer about the mutation rate, a?Make the inference given the following dataset from Luria <strong>and</strong> Delbrück,for N = 2.4 × 10 8 : {n m } = {1, 0, 3, 0, 0, 5, 0, 5, 0, 6, 107, 0, 0, 0, 1, 0, 0, 64, 0, 35}.[A small amount of computation is required to solve this problem.]35.3 Inferring causation[2, p.450]Exercise 35.5. In the Bayesian graphical model community, the taskof inferring which way the arrows point – that is, which nodes are parents,<strong>and</strong> which children – is one on which much has been written.Inferring causation is tricky because of ‘likelihood equivalence’. Two graphicalmodels are likelihood-equivalent if for any setting of the parameters ofeither, there exists a setting of the parameters of the other such that the twojoint probability distributions of all observables are identical. An example ofa pair of likelihood-equivalent models are A → B <strong>and</strong> B → A. The modelA → B asserts that A is the parent of B, or, in very sloppy terminology, ‘Acauses B’. An example of a situation where ‘B → A’ is true is the case whereB is the variable ‘burglar in house’ <strong>and</strong> A is the variable ‘alarm is ringing’.Here it is literally true that B causes A. But this choice of words is confusing ifapplied to another example, R → D, where R denotes ‘it rained this morning’<strong>and</strong> D denotes ‘the pavement is dry’. ‘R causes D’ is confusing. I’ll thereforeuse the words ‘B is a parent of A’ to denote causation. Some statistical methodsthat use the likelihood alone are unable to use data to distinguish betweenlikelihood-equivalent models. In a Bayesian approach, on the other h<strong>and</strong>, twolikelihood-equivalent models may nevertheless be somewhat distinguished, inthe light of data, since likelihood-equivalence does not force a Bayesian to usepriors that assign equivalent densities over the two parameter spaces of themodels.However, many Bayesian graphical modelling folks, perhaps out of sympathyfor their non-Bayesian colleagues, or from a latent urge not to appeardifferent from them, deliberately discard this potential advantage of Bayesianmethods – the ability to infer causation from data – by skewing their modelsso that the ability goes away; a widespread orthodoxy holds that one shouldidentify the choices of prior for which ‘prior equivalence’ holds, i.e., the priorssuch that models that are likelihood-equivalent also have identical posteriorprobabilities; <strong>and</strong> then one should use one of those priors in inference <strong>and</strong>prediction. This argument motivates the use, as the prior over all probabilityvectors, of specially-constructed Dirichlet distributions.In my view it is a philosophical error to use only those priors such thatcausation cannot be inferred. Priors should be set to describe one’s assumptions;when this is done, it’s likely that interesting inferences about causationcan be made from data.In this exercise, you’ll make an example of such an inference.Consider the toy problem where A <strong>and</strong> B are binary variables. The twomodels are H A→B <strong>and</strong> H B→A . H A→B asserts that the marginal probabilityof A comes from a beta distribution with parameters (1, 1), i.e., the uniformdistribution; <strong>and</strong> that the two conditional distributions P (b | a = 0) <strong>and</strong>P (b | a = 1) also come independently from beta distributions with parameters(1, 1). The other model assigns similar priors to the marginal probability ofB <strong>and</strong> the conditional distributions of A given B. Data are gathered, <strong>and</strong> the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!