11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

14.1. EVERYTHING CAN VARY AND PROBABLY SHOULD 355a_dept[4] -0.06 0.64 -1.41 1.06a_dept[5] -0.50 0.64 -1.77 0.65a_dept[6] -2.05 0.66 -3.35 -0.88sigma_dept 1.66 0.81 0.65 3.19e estimated effect of male is very similar to what we got in Chapter 11. But now we alsohave better estimates of the individual department average acceptance rates. You’ll see thatthe departments are ordered from those with the highest proportions accepted to the lowest.Remember, the values above are the α j estimates, and so they are deviations from the globalmean α, which in this case has mean −0.56.14.1.2. Varying effects of being male. Now, in order to teach you about varying slopes,let’s consider the variation in gender bias among departments. Sure, overall there isn’t muchevidence of gender bias in the previous model. But what if we allow the effect of an applicant’sbeing male vary in the same way we already allowed the overall rate of admission to vary?Such a model is a VARYING SLOPES (or RANDOM SLOPES) model. In this case, the model isspecified by defining a series of individual department effects of the variable male. In effect,we assume that every department not only has its own intercept (its log-odds of admission),but it also has its own slope (the change in log-odds of admission arising from changing anapplicant’s gender from female to male).Specifying this model will require a little more work, although conceptually it is familiarterritory. e trouble is that, in order to successfully pool information, we’ll need to definedistributions of not only the varying intercepts and slopes, but also their correlation. Anotherway to think of this is that we are defining a prior probability density for the cluster interceptsand slopes. We’ll estimate this prior from the data, so it’s an empirical prior, but we stillneed to define it’s basic shape, the parameters that will describe it. If we don’t allow for theGaussian distribution of intercepts to be correlated with the Gaussian distribution of slopes,then we are assuming they are uncorrelated. Now, maybe the overall rate of admissions ineach department really is uncorrelated with any bias by gender in admissions. But whatif they are correlated? What if departments that admit most applicants also favor femaleapplicants? In that case, defining the distribution of varying intercepts and slopes as beingcorrelated allows us to pool information not only across clusters (departments), but alsoacross parameter types.Suppose for example that you analyze the data for all departments except for the smallestone, finding a strong correlation between the overall probability of admission and theamount of gender bias. Now, before even seeing the data for the final department, you expectits intercept and slope to be correlated. Furthermore, if you learn its slope, it changesyour guess about its intercept. Likewise, if you learn its intercept, it changes your guess aboutits slope. In reality, both estimate simultaneously inform one another, just as all departmentssimultaneously inform one another. is is how allowing the varying intercepts and slopesto be correlated allows us to better pool information across clusters in the data.Another reason to allow for the correlation is that, if you want to generalize to new samplingclusters in the future, then you will need a complete definition of the joint distributionof the varying intercepts and slopes. If you assume they are uncorrelated, you mightwell make worse predictions than if you estimated their correlation. Now, if you have goodreason—either theoretical or practical—to assume the correlation is zero, then certainly doso. But otherwise, assuming that the distribution of varying intercepts is uncorrelated withthe distribution of varying slopes is a lot like assuming that the posterior distributions ofany two parameters are perfectly uncorrelated. As you know by now, having used so many

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!