11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.3. COMPONENTS OF THE MODEL 43You’ll see how these questions become extra parameters inside the likelihood.Rethinking: Datum or parameter? It is typical to conceive of data and parameters as completelydifferent kinds of entities. Data are measured and known; parameters are unknown and must beestimated from data. Usefully, in the Bayesian framework the distinction between a datum and aparameter is fuzzy: a datum can be recast as a very narrow probability density for a parameter, and aparameter as a datum with uncertainty. Much later in the book, you’ll see how to use this continuitybetween certainty (data) and uncertainty (parameters) to incorporate measurement error and missingdata into your modeling.2.3.3. Prior. For every parameter you intend your Bayesian machine to estimate, you mustprovide to the machine a PRIOR. A Bayesian machine must have an initial plausibility assignmentfor each possible value of the parameter. e prior is this initial set of plausibilities.When you have a previous estimate to provide to the machine, that can become the prior,as in the steps in FIGURE 2.5. Back in FIGURE 2.5, the machine did its learning one piece ofdata at a time. As a result, each estimate becomes the prior for the next step. But this doesn’tresolve the problem of providing a prior, because at the dawn of time, when n = 0, the machinestill had an initial estimate for the parameter p: a flat line specifying equal plausibilityfor every possible value.Overthinking: Prior as probability distribution. You could write the prior in the example here as:Pr(p) = 11 − 0 = 1.e prior is a probability distribution for the parameter. In general, for a uniform prior from a to b,the probability of any point in the interval is 1/(b−a). If you’re bugged by the fact that the probabilityof every value of p is 1, remember that every probability distribution must sum (integrate) to 1. eexpression 1/(b − a) ensures that the area under the flat line from a to b is equal to 1.e prior is what the machine “believes” before it sees the data. It is part of the model,not a reflection necessarily of what you believe. Clearly the likelihood contains many assumptionsthat are unlikely to be exactly true—completely independent tosses of the globe,no events other than W and L, constant p regardless of how the globe is tossed. Using alikelihood function does not force you to believe in it, even though the machine acts as if itbelieves. e same goes for priors.So where do priors come from? ey are engineering assumptions, chosen to help themachine learn. e flat prior in FIGURE 2.5 is very common, but it is hardly ever the bestprior. You’ll see later in the book that priors that gently nudge the machine usually improveinference. Such priors are sometimes called REGULARIZING or WEAKLY INFORMATIVE priors.ey are so useful that non-Bayesian statistical procedures have adopted a mathematicallyequivalent approach, PENALIZED LIKELIHOOD. You’ll meet examples in later chapters.More generally, priors are useful for constraining parameters to reasonable ranges, aswell as for expressing any knowledge we have about the parameter, before any data are observed.For example, in the globe tossing case, you know before the globe is tossed even oncethat the values p = 0 and p = 1 are completely implausible. You also know that any valuesof p very close to 0 and 1 are less plausible than values near p = 0.5. Even vague knowledgelike this can be useful, when evidence is scarce.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!