12.07.2015 Views

1 Introduction

1 Introduction

1 Introduction

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.2. Probability Theory 31In the curve fitting problem, we are given the training data x and t, along witha new test point x, and our goal is to predict the value of t. We therefore wishto evaluate the predictive distribution p(t|x, x, t). Here we shall assume that theparameters α and β are fixed and known in advance (in later chapters we shall discusshow such parameters can be inferred from data in a Bayesian setting).A Bayesian treatment simply corresponds to a consistent application of the sumand product rules of probability, which allow the predictive distribution to be writtenin the form∫p(t|x, x, t) = p(t|x, w)p(w|x, t)dw. (1.68)Here p(t|x, w) is given by (1.60), and we have omitted the dependence on α andβ to simplify the notation. Here p(w|x, t) is the posterior distribution over parameters,and can be found by normalizing the right-hand side of (1.66). We shall seein Section 3.3 that, for problems such as the curve-fitting example, this posteriordistribution is a Gaussian and can be evaluated analytically. Similarly, the integrationin (1.68) can also be performed analytically with the result that the predictivedistribution is given by a Gaussian of the formwhere the mean and variance are given byHere the matrix S is given byp(t|x, x, t) =N ( t|m(x),s 2 (x) ) (1.69)m(x) = βφ(x) T SN∑φ(x n )t n (1.70)n=1s 2 (x) = β −1 + φ(x) T Sφ(x). (1.71)S −1 = αI + βN∑φ(x n )φ(x) T (1.72)n=1where I is the unit matrix, and we have defined the vector φ(x) with elementsφ i (x) =x i for i =0,...,M.We see that the variance, as well as the mean, of the predictive distribution in(1.69) is dependent on x. The first term in (1.71) represents the uncertainty in thepredicted value of t due to the noise on the target variables and was expressed alreadyin the maximum likelihood predictive distribution (1.64) through β −1ML. However, thesecond term arises from the uncertainty in the parameters w and is a consequenceof the Bayesian treatment. The predictive distribution for the synthetic sinusoidalregression problem is illustrated in Figure 1.17.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!