12.07.2015 Views

1 Introduction

1 Introduction

1 Introduction

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.2. Probability Theory 23Section 2.1Section 2.4.3Section 1.3on this estimate are obtained by considering the distribution of possible data sets D.By contrast, from the Bayesian viewpoint there is only a single data set D (namelythe one that is actually observed), and the uncertainty in the parameters is expressedthrough a probability distribution over w.A widely used frequentist estimator is maximum likelihood, in which w is setto the value that maximizes the likelihood function p(D|w). This corresponds tochoosing the value of w for which the probability of the observed data set is maximized.In the machine learning literature, the negative log of the likelihood functionis called an error function. Because the negative logarithm is a monotonically decreasingfunction, maximizing the likelihood is equivalent to minimizing the error.One approach to determining frequentist error bars is the bootstrap (Efron, 1979;Hastie et al., 2001), in which multiple data sets are created as follows. Suppose ouroriginal data set consists of N data points X = {x 1 ,...,x N }. We can create a newdata set X B by drawing N points at random from X, with replacement, so that somepoints in X may be replicated in X B , whereas other points in X may be absent fromX B . This process can be repeated L times to generate L data sets each of size N andeach obtained by sampling from the original data set X. The statistical accuracy ofparameter estimates can then be evaluated by looking at the variability of predictionsbetween the different bootstrap data sets.One advantage of the Bayesian viewpoint is that the inclusion of prior knowledgearises naturally. Suppose, for instance, that a fair-looking coin is tossed threetimes and lands heads each time. A classical maximum likelihood estimate of theprobability of landing heads would give 1, implying that all future tosses will landheads! By contrast, a Bayesian approach with any reasonable prior will lead to amuch less extreme conclusion.There has been much controversy and debate associated with the relative meritsof the frequentist and Bayesian paradigms, which have not been helped by thefact that there is no unique frequentist, or even Bayesian, viewpoint. For instance,one common criticism of the Bayesian approach is that the prior distribution is oftenselected on the basis of mathematical convenience rather than as a reflection ofany prior beliefs. Even the subjective nature of the conclusions through their dependenceon the choice of prior is seen by some as a source of difficulty. Reducingthe dependence on the prior is one motivation for so-called noninformative priors.However, these lead to difficulties when comparing different models, and indeedBayesian methods based on poor choices of prior can give poor results with highconfidence. Frequentist evaluation methods offer some protection from such problems,and techniques such as cross-validation remain useful in areas such as modelcomparison.This book places a strong emphasis on the Bayesian viewpoint, reflecting thehuge growth in the practical importance of Bayesian methods in the past few years,while also discussing useful frequentist concepts as required.Although the Bayesian framework has its origins in the 18 th century, the practicalapplication of Bayesian methods was for a long time severely limited by thedifficulties in carrying through the full Bayesian procedure, particularly the need tomarginalize (sum or integrate) over the whole of parameter space, which, as we shall

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!