11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

304 12. MONSTERS AND MIXTUREShere? Because cumulative probability naturally constrains itself to never exceeding a totalprobability of one. And because this is an ordered density, we know that the cumulative logoddsof the largest observable value must be +∞, which is the same as cumulative probabilityof one. is anchors the distribution and standardizes it at the same time. If you start insteadwith discrete individual probabilities of each outcome, then you’d have to later standardizethese probabilities to ensure they sum to exactly one. It turns out to be easier to just startwith the cumulative probability and then work backwards to the individual probabilities. Iknow, this seems weird. But I’ll walk you through it. 108What we want is for the cumulative log-odds of an observed value y i to be equal-to-orless-thansome possible value k to be:logPr(y i ≤ k)1 − Pr(y i ≤ k) = ϕ k, (12.1)where ϕ k is a continuous value, different for each observable value k. We’ll make this valueinto a linear model, in a bit. For now, it’s just a placeholder. e above function is just adirect embodiment of the log-odds and cumulative density objectives we’ve stated so far. Itactually says nothing else at all. Now we solve for the cumulative probability density itself.Do this by taking (12.1) and solving for Pr(y i ≤ k). Aer a little algebra, you get:Pr(y i ≤ k) = exp(ϕ k)1 + exp(ϕ k ) .You might recognize this probability as the logistic, same as in the last chapter. It arose in thesame way, establishing the logistic function as the inverse link for the binomial model. Butnow we have a CUMULATIVE LOGISTIC, since the probability Pr(y i ≤ k) is cumulative.But we still need likelihoods, which are not cumulative. So how do you use this thing?Well, it’s a probability density, so you can use it to define the likelihood of any observationy i . By definition, in a discrete probability density, the likelihood of any observation y i = kmust be:Pr(y i = k) = Pr(y i ≤ k) − Pr(y i ≤ k − 1). (12.2)is just says that since the logistic is cumulative, we can compute the discrete probability ofexactly y i = k by subtracting the cumulative probability of one observable value lower thank.12.1.2. Putting the GLM in the ϕ. We’re almost ready to walk through actually writingthe code to fit models to this distribution. But before we get to coding, there is one moreconceptual step. To build additive models of the log-odds values ϕ k , we’ll need to distinguishthe intercept at each value k from the rest of the additive model.In the simplest case, there are no predictor variables, and so each observable value khas a unique log-odds value ϕ k that just translates between the cumulative probability scaleand the log-odds scale of the model. For example, consider a case in which the observablevalues are the numbers 1, 2 and 3. Now the maximum likelihood estimate of the cumulativeprobability of observing a “1” is just going to be the observed proportion of 1’s in the data.e estimate of the cumulative probability of a 2-or-less is going to be the proportion of theobserved values that are 2 or 1. e cumulative probability of a 3 must be exactly 1, becauseit’s the maximum value. So if there are, say, 100 observations, and 31 of them are “1” and49 of them are “2”, then the estimated cumulative probabilities of 1, 2 and 3 are: 31/100,(31+49)/100, 1. Crunched into proportions, those values are : 0.31, 0.8, and 1. Now the ϕ

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!