10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.362 29 — Monte Carlo MethodsBut P (x) is too complicated a function for us to be able to sample from itdirectly. We now assume that we have a simpler density Q(x) from which wecan generate samples <strong>and</strong> which we can evaluate to within a multiplicativeconstant (that is, we can evaluate Q ∗ (x), where Q(x) = Q ∗ (x)/Z Q ). Anexample of the functions P ∗ , Q ∗ <strong>and</strong> φ is shown in figure 29.5. We call Q thesampler density.In importance sampling, we generate R samples {x (r) } R r=1 from Q(x). Ifthese points were samples from P (x) then we could estimate Φ by equation(29.6). But when we generate samples from Q, values of x where Q(x) isgreater than P (x) will be over-represented in this estimator, <strong>and</strong> points whereQ(x) is less than P (x) will be under-represented. To take into account thefact that we have sampled from the wrong distribution, we introduce weightsP ∗ (x)Q ∗ (x)φ(x)xw r ≡ P ∗ (x (r) )Q ∗ (x (r) )(29.21)which we use to adjust the ‘importance’ of each point in our estimator thus:∑r ˆΦ ≡w rφ(x (r) )∑r w . (29.22)r⊲ Exercise 29.1. [2, p.384] Prove that, if Q(x) is non-zero for all x where P (x) isnon-zero, the estimator ˆΦ converges to Φ, the mean value of φ(x), as Rincreases. What is the variance of this estimator, asymptotically? Hint:consider the statistics of the numerator <strong>and</strong> the denominator separately.Is the estimator ˆΦ an unbiased estimator for small R?Figure 29.5. Functions involved inimportance sampling. We wish toestimate the expectation of φ(x)under P (x) ∝ P ∗ (x). We cangenerate samples from the simplerdistribution Q(x) ∝ Q ∗ (x). Wecan evaluate Q ∗ <strong>and</strong> P ∗ at anypoint.A practical difficulty with importance sampling is that it is hard to estimatehow reliable the estimator ˆΦ is. The variance of the estimator is unknownbeforeh<strong>and</strong>, because it depends on an integral over x of a function involvingP ∗ (x). And the variance of ˆΦ is hard to estimate, because the empiricalvariances of the quantities w r <strong>and</strong> w r φ(x (r) ) are not necessarily a good guideto the true variances of the numerator <strong>and</strong> denominator in equation (29.22).If the proposal density Q(x) is small in a region where |φ(x)P ∗ (x)| is largethen it is quite possible, even after many points x (r) have been generated, thatnone of them will have fallen in that region. In this case the estimate of Φwould be drastically wrong, <strong>and</strong> there would be no indication in the empiricalvariance that the true variance of the estimator ˆΦ is large.(a)-6.2-6.4-6.6-6.8-7-7.2-6.2-6.4-6.6-6.8-7(b)-7.2Figure 29.6. Importance samplingin action: (a) using a Gaussiansampler density; (b) using aCauchy sampler density. Verticalaxis shows the estimate ˆΦ. Thehorizontal line indicates the truevalue of Φ. Horizontal axis showsnumber of samples on a log scale.10 100 1000 10000 100000 100000010 100 1000 10000 100000 1000000Cautionary illustration of importance samplingIn a toy problem related to the modelling of amino acid probability distributionswith a one-dimensional variable x, I evaluated a quantity of interest usingimportance sampling. The results using a Gaussian sampler <strong>and</strong> a Cauchysampler are shown in figure 29.6. The horizontal axis shows the number of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!