10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.538 45 — Gaussian ProcessesNonparametric approaches.In nonparametric methods, predictions are obtained without explicitly parameterizingthe unknown function y(x); y(x) lives in the infinite-dimensionalspace of all continuous functions of x. One well known nonparametric approachto the regression problem is the spline smoothing method (Kimeldorf<strong>and</strong> Wahba, 1970). A spline solution to a one-dimensional regression problemcan be described as follows: we define the estimator of y(x) to be the functionŷ(x) that minimizes the functionalM(y(x)) = 1 N 2 β ∑(y(x (n) ) − t n ) 2 + 1 ∫2 α dx [y (p) (x)] 2 , (45.9)n=1where y (p) is the pth derivative of y <strong>and</strong> p is a positive number. If p is set to2 then the resulting function ŷ(x) is a cubic spline, that is, a piecewise cubicfunction that has ‘knots’ – discontinuities in its second derivative – at the datapoints {x (n) }.This estimation method can be interpreted as a Bayesian method by identifyingthe prior for the function y(x) as:ln P (y(x) | α) = − 1 2 α ∫dx [y (p) (x)] 2 + const, (45.10)assuming inde-<strong>and</strong> the probability of the data measurements t N = {t n } N n=1pendent Gaussian noise as:ln P (t N | y(x), β) = − 1 N 2 β ∑(y(x (n) ) − t n ) 2 + const. (45.11)n=1[The constants in equations (45.10) <strong>and</strong> (45.11) are functions of α <strong>and</strong> β respectively.Strictly the prior (45.10) is improper since addition of an arbitrarypolynomial of degree (p − 1) to y(x) is not constrained. This impropriety iseasily rectified by the addition of (p − 1) appropriate terms to (45.10).] Giventhis interpretation of the functions in equation (45.9), M(y(x)) is equal to minusthe log of the posterior probability P (y(x) | t N , α, β), within an additiveconstant, <strong>and</strong> the splines estimation procedure can be interpreted as yieldinga Bayesian MAP estimate. The Bayesian perspective allows us additionallyto put error bars on the splines estimate <strong>and</strong> to draw typical samples fromthe posterior distribution, <strong>and</strong> it gives an automatic method for inferring thehyperparameters α <strong>and</strong> β.CommentsSplines priors are Gaussian processesThe prior distribution defined in equation (45.10) is our first example of aGaussian process. Throwing mathematical precision to the winds, a Gaussianprocess can be defined as a probability distribution on a space of functionsy(x) that can be written in the formP (y(x) | µ(x), A) = 1 [−Z exp 1 ]2 (y(x) − µ(x))T A(y(x) − µ(x)) , (45.12)where µ(x) is the mean function <strong>and</strong> A is a linear operator, <strong>and</strong> where the innerproduct of two functions y(x) T z(x) is defined by, for example, ∫ dx y(x)z(x).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!