1 Introduction
1 Introduction
1 Introduction
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1.2. Probability Theory 31In the curve fitting problem, we are given the training data x and t, along witha new test point x, and our goal is to predict the value of t. We therefore wishto evaluate the predictive distribution p(t|x, x, t). Here we shall assume that theparameters α and β are fixed and known in advance (in later chapters we shall discusshow such parameters can be inferred from data in a Bayesian setting).A Bayesian treatment simply corresponds to a consistent application of the sumand product rules of probability, which allow the predictive distribution to be writtenin the form∫p(t|x, x, t) = p(t|x, w)p(w|x, t)dw. (1.68)Here p(t|x, w) is given by (1.60), and we have omitted the dependence on α andβ to simplify the notation. Here p(w|x, t) is the posterior distribution over parameters,and can be found by normalizing the right-hand side of (1.66). We shall seein Section 3.3 that, for problems such as the curve-fitting example, this posteriordistribution is a Gaussian and can be evaluated analytically. Similarly, the integrationin (1.68) can also be performed analytically with the result that the predictivedistribution is given by a Gaussian of the formwhere the mean and variance are given byHere the matrix S is given byp(t|x, x, t) =N ( t|m(x),s 2 (x) ) (1.69)m(x) = βφ(x) T SN∑φ(x n )t n (1.70)n=1s 2 (x) = β −1 + φ(x) T Sφ(x). (1.71)S −1 = αI + βN∑φ(x n )φ(x) T (1.72)n=1where I is the unit matrix, and we have defined the vector φ(x) with elementsφ i (x) =x i for i =0,...,M.We see that the variance, as well as the mean, of the predictive distribution in(1.69) is dependent on x. The first term in (1.71) represents the uncertainty in thepredicted value of t due to the noise on the target variables and was expressed alreadyin the maximum likelihood predictive distribution (1.64) through β −1ML. However, thesecond term arises from the uncertainty in the parameters w and is a consequenceof the Bayesian treatment. The predictive distribution for the synthetic sinusoidalregression problem is illustrated in Figure 1.17.