10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.544 45 — Gaussian ProcessesWe will denote the parameters of a covariance function by θ. The covariancematrix of t has entries given byC mn = C(x (m) , x (n) ; θ) + δ mn N (x (n) ; θ) (45.44)where C is the covariance function <strong>and</strong> N is a noise model which might bestationary or spatially varying, for example,{θ3 for input-independent noiseN (x; θ) = ∑J)exp(j=1 β jφ j (x) for input-dependent noise.(45.45)The continuity properties of C determine the continuity properties of typicalsamples from the Gaussian process prior. An encyclopaedic paper on Gaussianprocesses giving many valid covariance functions has been written byAbrahamsen (1997).Stationary covariance functionsA stationary covariance function is one that is translation invariant in that itsatisfiesC(x, x ′ ; θ) = D(x − x ′ ; θ) (45.46)for some function D, i.e., the covariance is a function of separation only, alsoknown as the autocovariance function. If additionally C depends only on themagnitude of the distance between x <strong>and</strong> x ′ then the covariance function is saidto be homogeneous. Stationary covariance functions may also be described interms of the Fourier transform of the function D, which is known as the powerspectrum of the Gaussian process. This Fourier transform is necessarily apositive function of frequency. One way of constructing a valid stationarycovariance function is to invent a positive function of frequency <strong>and</strong> define Dto be its inverse Fourier transform.Example 45.5. Let the power spectrum be a Gaussian function of frequency.Since the Fourier transform of a Gaussian is a Gaussian, the autocovariancefunction corresponding to this power spectrum is a Gaussianfunction of separation. This argument rederives the covariance functionwe derived at equation (45.30).Generalizing slightly, a popular form for C with hyperparameters θ =(θ 1 , θ 2 , {r i }) is[]C(x, x ′ ; θ) = θ 1 exp − 1 I∑ (x i − x ′ i )22 ri2 + θ 2 . (45.47)x is an I-dimensional vector <strong>and</strong> r i is a lengthscale associated with input x i , thelengthscale in direction i on which y is expected to vary significantly. A verylarge lengthscale means that y is expected to be essentially a constant functionof that input. Such an input could be said to be irrelevant, as in the automaticrelevance determination method for neural networks (MacKay, 1994a; Neal,1996). The θ 1 hyperparameter defines the vertical scale of variations of atypical function. The θ 2 hyperparameter allows the whole function to beoffset away from zero by some unknown constant – to underst<strong>and</strong> this term,examine equation (45.25) <strong>and</strong> consider the basis function φ(x) = 1.Another stationary covariance function isi=1C(x, x ′ ) = exp(−|x − x ′ | ν ) 0 < ν ≤ 2. (45.48)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!