12.07.2015 Views

1 Introduction

1 Introduction

1 Introduction

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.2. Probability Theory 29Figure 1.16 Schematic illustration of a Gaussianconditional distribution for t given x given by(1.60), in which the mean is given by the polynomialfunction y(x, w), and the precision is givenby the parameter β, which is related to the varianceby β −1 = σ 2 .ty(x 0 , w)p(t|x 0 , w,β)y(x, w)2σx 0xWe now use the training data {x, t} to determine the values of the unknownparameters w and β by maximum likelihood. If the data are assumed to be drawnindependently from the distribution (1.60), then the likelihood function is given byp(t|x, w,β)=N∏N ( t n |y(x n , w),β −1) . (1.61)n=1As we did in the case of the simple Gaussian distribution earlier, it is convenient tomaximize the logarithm of the likelihood function. Substituting for the form of theGaussian distribution, given by (1.46), we obtain the log likelihood function in theformln p(t|x, w,β)=− β N∑{y(x n , w) − t n } 2 + N 22 ln β − N ln(2π). (1.62)2n=1Consider first the determination of the maximum likelihood solution for the polynomialcoefficients, which will be denoted by w ML . These are determined by maximizing(1.62) with respect to w. For this purpose, we can omit the last two termson the right-hand side of (1.62) because they do not depend on w. Also, we notethat scaling the log likelihood by a positive constant coefficient does not alter thelocation of the maximum with respect to w, and so we can replace the coefficientβ/2 with 1/2. Finally, instead of maximizing the log likelihood, we can equivalentlyminimize the negative log likelihood. We therefore see that maximizing likelihood isequivalent, so far as determining w is concerned, to minimizing the sum-of-squareserror function defined by (1.2). Thus the sum-of-squares error function has arisen asa consequence of maximizing likelihood under the assumption of a Gaussian noisedistribution.We can also use maximum likelihood to determine the precision parameter β ofthe Gaussian conditional distribution. Maximizing (1.62) with respect to β gives1β ML= 1 NN∑{y(x n , w ML ) − t n } 2 . (1.63)n=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!