10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.45.4: Examples of covariance functions 543matrix for the vector t N+1 ≡ (t 1 , . . . , t N+1 ) T . We define submatrices of C N+1as follows:⎡⎡⎤⎡⎤⎤⎣ C N⎦⎣ k ⎦C N+1 ≡ ⎢⎥⎣⎦ . (45.35)[ ] [ ]k T κThe posterior distribution (45.34) is given by[P (t N+1 | t N ) ∝ exp − 1 [ ]][ ]tN t N+1 C−1 tNN+1. (45.36)2t N+1We can evaluate the mean <strong>and</strong> st<strong>and</strong>ard deviation of the posterior distributionof t N+1 by brute-force inversion of C N+1 . There is a more elegant expressionfor the predictive distribution, however, which is useful whenever predictionsare to be made at a number of new points on the basis of the data set of sizeN. We can write C −1N+1 in terms of C N <strong>and</strong> C −1Nusing the partitioned inverseequations (Barnett, 1979):[ ]C −1 M mN+1 = (45.37)m T mwherem = ( κ − k T C −1N k) −1(45.38)m = −m C −1N k (45.39)M = C −1N + 1 m mmT . (45.40)When we substitute this matrix into equation (45.36) we find[]P (t N+1 | t N ) = 1 Z exp − (t N+1 − ˆt N+1 ) 22σ 2ˆt N+1(45.41)whereˆt N+1 = k T C −1N t N (45.42)σ 2ˆt N+1= κ − k T C −1Nk. (45.43)The predictive mean at the new point is given by ˆt N+1 <strong>and</strong> σˆt N+1defines theerror bars on this prediction. Notice that we do not need to invert C N+1 inorder to make predictions at x (N+1) . Only C N needs to be inverted. ThusGaussian processes allow one to implement a model with a number of basisfunctions H much larger than the number of data points N, with the computationalrequirement being of order N 3 , independent of H. [We’ll discussways of reducing this cost later.]The predictions produced by a Gaussian process depend entirely on thecovariance matrix C. We now discuss the sorts of covariance functions onemight choose to define C, <strong>and</strong> how we can automate the selection of thecovariance function in response to data.45.4 Examples of covariance functionsThe only constraint on our choice of covariance function is that it must generatea non-negative-definite covariance matrix for any set of points {x n } N n=1 .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!