10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.536 45 — Gaussian Processesto model speech waveforms, also correspond to Gaussian process models; themethod of ‘kriging’ in geostatistics is a Gaussian process regression method.Reservations about Gaussian processesIt might be thought that it is not possible to reproduce the interesting propertiesof neural network interpolation methods with something so simple as aGaussian distribution, but as we shall now see, many popular nonlinear interpolationmethods are equivalent to particular Gaussian processes. (I use theterm ‘interpolation’ to cover both the problem of ‘regression’ – fitting a curvethrough noisy data – <strong>and</strong> the task of fitting an interpolant that passes exactlythrough the given data points.)It might also be thought that the computational complexity of inferencewhen we work with priors over infinite-dimensional function spaces might beinfinitely large. But by concentrating on the joint probability distribution ofthe observed data <strong>and</strong> the quantities we wish to predict, it is possible to makepredictions with resources that scale as polynomial functions of N, the numberof data points.45.1 St<strong>and</strong>ard methods for nonlinear regressionThe problemWe are given N data points X N , t N = {x (n) , t n } N n=1 . The inputs x are vectorsof some fixed input dimension I. The targets t are either real numbers,in which case the task will be a regression or interpolation task, or they arecategorical variables, for example t ∈ {0, 1}, in which case the task is a classificationtask. We will concentrate on the case of regression for the timebeing.Assuming that a function y(x) underlies the observed data, the task is toinfer the function from the given data, <strong>and</strong> predict the function’s value – orthe value of the observation t N+1 – at a new point x (N+1) .Parametric approaches to the problemIn a parametric approach to regression we express the unknown function y(x)in terms of a nonlinear function y(x; w) parameterized by parameters w.Example 45.2. Fixed basis functions. Using a set of basis functions {φ h (x)} H h=1 ,we can writeH∑y(x; w) = w h φ h (x). (45.2)h=1If the basis functions are nonlinear functions of x such as radial basisfunctions centred at fixed points {c h } H h=1 ,φ h (x) = exp[− (x − c h) 2 ]2r 2 , (45.3)then y(x; w) is a nonlinear function of x; however, since the dependenceof y on the parameters w is linear, we might sometimes refer to this asa ‘linear’ model. In neural network terms, this model is like a multilayernetwork whose connections from the input layer to the nonlinear hiddenlayer are fixed; only the output weights w are adaptive.Other possible sets of fixed basis functions include polynomials such asφ h (x) = x p i xq jwhere p <strong>and</strong> q are integer powers that depend on h.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!