10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.36.2: Further reading 453Now the exciting bit. Should we prospect? Once we have prospected atsite n p , we will choose the site using the decision rule (36.9) with the value ofmean µ np replaced by the updated value µ ′ n given by (36.8). What makes theproblem exciting is that we don’t yet know the value of d n , so we don’t knowwhat our action n a will be; indeed the whole value of doing the prospectingcomes from the fact that the outcome d n may alter the action from the onethat we would have taken in the absence of the experimental information.From the expression for the new mean in terms of d n (36.8), <strong>and</strong> the knownvariance of d n (36.6), we can compute the probability distribution of the keyquantity, µ ′ n , <strong>and</strong> can work out the expected utility by integrating over allpossible outcomes <strong>and</strong> their associated actions.Exercise 36.3. [2 ] Show that the probability distribution of the new mean µ ′ n(36.8) is Gaussian with mean µ n <strong>and</strong> varianceσ 2 ns 2 ≡ σn2 σ 2 + σn2 . (36.11)Consider prospecting at site n. Let the biggest mean of the other sites beµ 1 . When we obtain the new value of the mean, µ ′ n , we will choose site n <strong>and</strong>get an expected return of µ ′ n if µ′ n > µ 1, <strong>and</strong> we will choose site 1 <strong>and</strong> get anexpected return of µ 1 if µ ′ n < µ 1.So the expected utility of prospecting at site n, then picking the best site,is∫ ∞E[U | prospect at n] = −c n + P (µ ′ n < µ 1) µ 1 + dµ ′ n µ′ n Normal(µ′ n ; µ n, s 2 ).µ 1(36.12)The difference in utility between prospecting <strong>and</strong> not prospecting is thequantity of interest, <strong>and</strong> it depends on what we would have done withoutprospecting; <strong>and</strong> that depends on whether µ 1 is bigger than µ n .{ −µ1 if µE[U | no prospecting] =1 ≥ µ n(36.13)−µ n if µ 1 ≤ µ n .SoE[U | prospect at n] − E[U | no prospecting]⎧ ∫ ∞⎪⎨ −c n + dµ ′ n (µ ′ n − µ 1 ) Normal(µ ′ n; µ n , s 2 ) if µ 1 ≥ µ n=∫ µ 1 µ1(36.14)⎪⎩ −c n + dµ ′ n (µ 1 − µ ′ n ) Normal(µ′ n ; µ n, s 2 ) if µ 1 ≤ µ n .−∞We can plot the change in expected utility due to prospecting (omittingc n ) as a function of the difference (µ n − µ 1 ) (horizontal axis) <strong>and</strong> the initialst<strong>and</strong>ard deviation σ n (vertical axis). In the figure the noise variance is σ 2 = 1.0-6 -4 -2 0 2 4 6(µ n − µ 1 )σ n3.532.521.510.5Figure 36.1. Contour plot of thegain in expected utility due toprospecting. The contours areequally spaced from 0.1 to 1.2 insteps of 0.1. To decide whether itis worth prospecting at site n, findthe contour equal to c n (the costof prospecting); all points[(µ n −µ 1 ), σ n ] above that contourare worthwhile.36.2 Further readingIf the world in which we act is a little more complicated than the prospectingproblem – for example, if multiple iterations of prospecting are possible, <strong>and</strong>the cost of prospecting is uncertain – then finding the optimal balance betweenexploration <strong>and</strong> exploitation becomes a much harder computational problem.Reinforcement learning addresses approximate methods for this problem (Sutton<strong>and</strong> Barto, 1998).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!