10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.540 45 — Gaussian Processes45.2 From parametric models to Gaussian processesLinear modelsLet us consider a regression problem using H fixed basis functions, for exampleone-dimensional radial basis functions as defined in equation (45.3).Let us assume that a list of N input points {x (n) } has been specified <strong>and</strong>define the N × H matrix R to be the matrix of values of the basis functions{φ h (x)} H h=1 at the points {x n},R nh ≡ φ h (x (n) ). (45.17)We define the vector y N to be the vector of values of y(x) at the N points,y n ≡ ∑ hR nh w h . (45.18)If the prior distribution of w is Gaussian with zero mean,P (w) = Normal(w; 0, σ 2 wI), (45.19)then y, being a linear function of w, is also Gaussian distributed, with meanzero. The covariance matrix of y isSo the prior distribution of y is:Q = 〈yy T 〉 = 〈Rww T R T 〉 = R 〈ww T 〉 R T (45.20)= σ 2 wRR T . (45.21)P (y) = Normal(y; 0, Q) = Normal(y; 0, σ 2 w RRT ). (45.22)This result, that the vector of N function values y has a Gaussian distribution,is true for any selected points X N . This is the defining property of aGaussian process. The probability distribution of a function y(x) is a Gaussianprocess if for any finite selection of points x (1) , x (2) , . . . , x (N) , the densityP (y(x (1) ), y(x (2) ), . . . , y(x (N) )) is a Gaussian.Now, if the number of basis functions H is smaller than the number ofdata points N, then the matrix Q will not have full rank. In this case theprobability distribution of y might be thought of as a flat elliptical pancakeconfined to an H-dimensional subspace in the N-dimensional space in whichy lives.What about the target values? If each target t n is assumed to differ byadditive Gaussian noise of variance σν 2 from the corresponding function valuey n then t also has a Gaussian prior distribution,P (t) = Normal(t; 0, Q + σ 2 νI). (45.23)We will denote the covariance matrix of t by C:C = Q + σν 2 I = σ2 w RRT + σν 2 I. (45.24)Whether or not Q has full rank, the covariance matrix C has full rank sinceσνI 2 is full rank.What does the covariance matrix Q look like? In general, the (n, n ′ ) entryof Q is∑Q nn ′ = [σwRR 2 T ] nn ′ = σw2 φ h (x (n) )φ h (x (n′) ) (45.25)h

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!