11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

316 12. MONSTERS AND MIXTURESanswering the same scenarios. So is it fair to reward m11.4 for predicting the average responsesof these specific individuals? I think not. Now, the estimates for the slope parametersin m11.4 may indeed produce better out-of-sample predictions than those in m11.3.But we can’t easily decide that from the AICc comparison above, because the comparison isdominated by the individual intercepts.Instead, we’d rather have a way to focus on particular aspects of prediction, like the slopeparameters, while ignoring others, like the individual intercepts. is concern is sometimescalled the question of focus. We’re not going to deal with this concern in any detail, rightnow. But when you reach Chapter 13, question of focus will appear again.12.2. Ranked OutcomesProblem with ranks: Must predict entire vector at once, because only one item can be#1. Use Keener and Waldman approach with orthant probabilities. Code for this is ready.Need a good example data frame. Alex’s village data? Ryan’s field data?12.3. Variable Probabilities: Beta-binomialroughput the book so far, we’ve been implicitly assuming that the “true” value of eachparameter was the same for every unit in the data. Every individual person, chimpanzee,nation, or location had the same α and the same β and the same p. But suppose this isn’t thecase. Suppose instead that different units in the data actually experience different underlyprobabilities or rates.You’ve already seen an empirical case in which this possibility was important, when youanalyzed the UCB admissions data earlier in the previous chapter. In ignoring the fact thatpeople applying to different departments had different overall probabilities of admission, wewere lead to the wrong conclusion. It was only by estimating a different baseline, or intercept,probability in each case that we could get the model to tell us what was obvious from plottingthe data.is kind of situation is quite common. Whenever there are clusters of some kind in thedata, then they might experience different probabilities of an event, because of unobservedfactors linking individual observations within that cluster. In the case of the academic departments,it is that selection criteria and funding amounts are clustered by department, andso exert a common causal influence on everyone who applies, even though deviations amongapplicants may be due to differences among applicants.In this section, I’m going to show you the simplest way to begin to model this kind ofheterogeneity among units, without using the fixed effects approach we used for the UCBadmissions data. ere are a number of reasons to move beyond the fixed effects approach,the approach of assigning a dummy variable for each cluster. I’ll speak to these reasons inmuch more depth in the next chapter. For now, it is sufficient to realize that the averageprobability in each cluster does inform us about the average probabilities in the other clusters.e fixed effects approach instead assumes that the per-cluster probabilities are completelyindependent of one another, and therefore the estimate of each takes no advantage of the datafrom the other clusters. So it will help us to pool the information and make better inferences,if we explicitly model the varying probabilities by cluster.e second important reason for now is that oen we’d like to actually estimate the distributionof the probabilities across units. We’ll need such an estimate, if we want to generalizepredictions to units we haven’t yet sampled. For example, what about academic departments

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!