11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

90 4. LINEAR MODELSe approach above surely isn’t the only way to describe statistical modeling, but it is awidespread and productive language. Once a scientist learns this language, it becomes easierto communicate the assumptions of our models. We no longer have to remember seeminglyarbitrary lists of bizarre conditions like homoscedasticity (constant variance), because we canjust read these conditions from the model definitions. We will also be able to see natural waysto change these assumptions, instead of feeling trapped within some procrustean model typelike regression or multiple regression or ANOVA or ANCOVA or such. ese are all thesame kind of model, and that fact becomes obvious once we know how to talk about modelsas mappings of one set of variables through a probability distribution onto another set ofvariables.4.2.1. Re-describing the globe tossing model. It’s good to work with examples. Recall theproportion of water problem from previous chapters. e model in that case was always:w ∼ Binomial(n, p)p ∼ Uniform(0, 1)where w was the observed count of water samples, n was the total number of samples, and pwas the proportion of water on the actual globe. Read the above statement as:e count w is distributed binomially with sample size n and probability p.e prior for p is assumed to be uniform between zero and one.Once we know the model in this way, we automatically know all of its assumptions. Weknow the binomial distribution assumes that each sample (globe toss) is independent of theothers, and so we also know that the model assumes that sample points are independent ofone another.For now, we’ll focus on simple models like the above. In these models, the first line definesthe likelihood function used in Bayes’ theorem. e other lines define priors. Both ofthe lines in this model are STOCHASTIC, as indicated by the ∼ symbol. A stochastic relationshipis just a mapping of a variable or parameter onto a distribution. It is stochastic becauseno single instance of the variable on the le is known with certainty. Instead, the mapping isprobabilistic: some values are more plausible than others, but very many different values areplausible under any model. Later, we’ll have models with deterministic definitions in themas well.Overthinking: From model definition to Bayes’ theorem. To relate the mathematical format aboveto Bayes’ theorem, you could use the model definition to define the posterior distribution:Pr(p|w, n) =Binomial(w|n, p)Uniform(p|0, 1)∫Binomial(w|n, p)Uniform(p|0, 1)dp.at monstrous denominator is just the average likelihood again. It standardizes the posterior to sumto 1. e action is in the numerator, where the posterior probability of any particular value of p isseen again to be proportional to the product of the likelihood and prior. In R code form, this is thesame grid approximation calculation you’ve been using all along. In a form recognizable as the aboveexpression:R code4.6w

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!