10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ttttCopyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.542 45 — Gaussian Processes3.02.01.00.0−1.0−2.0−3.0 −1.0 1.0 3.0 5.0x )(a) 2 exp(− (x−x′ ) 22(1.5) 24.02.00.0−2.0−4.0−3.0 −1.0 1.0 3.0 5.0( x(c) 2 exp)− sin2 (π(x−x ′ )/3.0)2(0.5) 24.02.00.0−2.0−4.0−3.0 −1.0 1.0 3.0 5.0x )(b) 2 exp(− (x−x′ ) 22(0.35) 26.04.02.00.0−2.0−4.0−3.0 −1.0 1.0 3.0 5.0x )(d) 2 exp(− (x−x′ ) 22(1.5) 2Figure 45.1. Samples drawn fromGaussian process priors. Eachpanel shows two functions drawnfrom a Gaussian process prior.The four corresponding covariancefunctions are given below eachplot. The decrease in lengthscalefrom (a) to (b) produces morerapidly fluctuating functions. Theperiodic properties of thecovariance function in (c) can beseen. The covariance function in(d) contains the non-stationaryterm xx ′ corresponding to thecovariance of a straight line, sothat typical functions includelinear trends. From Gibbs (1997).+ xx ′Multilayer neural networks <strong>and</strong> Gaussian processesFigures 44.2 <strong>and</strong> 44.3 show some r<strong>and</strong>om samples from the prior distributionover functions defined by a selection of st<strong>and</strong>ard multilayer perceptrons withlarge numbers of hidden units. Those samples don’t seem a million miles awayfrom the Gaussian process samples of figure 45.1. And indeed Neal (1996)showed that the properties of a neural network with one hidden layer (asin equation (45.4)) converge to those of a Gaussian process as the number ofhidden neurons tends to infinity, if st<strong>and</strong>ard ‘weight decay’ priors are assumed.The covariance function of this Gaussian process depends on the details of thepriors assumed for the weights in the network <strong>and</strong> the activation functions ofthe hidden units.45.3 Using a given Gaussian process model in regressionWe have spent some time talking about priors. We now return to our data<strong>and</strong> the problem of prediction. How do we make predictions with a Gaussianprocess?Having formed the covariance matrix C defined in equation (45.32) our taskis to infer t N+1 given the observed vector t N . The joint density P (t N+1 , t N )is a Gaussian; so the conditional distributionP (t N+1 | t N ) = P (t N+1, t N )P (t N )(45.34)is also a Gaussian. We now distinguish between different sizes of covariancematrix C with a subscript, such that C N+1 is the (N +1)×(N +1) covariance

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!