11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

130 5. MULTIVARIATE LINEAR MODELSDivorce rate6 8 10 12 14MENJOKARALSCGAFIGURE 5.1. e number of Waffle House dinersper million people is associated with divorce rate(in the year 2009) within the United States. Eachpoint is a State. “Southern” (former Confederate)States shown in blue. Shaded region is 95%precentile interval of the mean. ese data are indata(WaffleDivorce) in the rethinking package.0 10 20 30 40 50Waffle Houses per millionthan one type of influence, we should. Furthermore, when causation is multiple,one cause can hide another. Multivariate models can help in such settings.(3) Interactions. Even when variables are completely uncorrelated, the importance ofeach may still depend upon the other. For example, plants benefit from both lightand water. But in the absence of either, the other is no benefit at all. Such INTER-ACTIONS occur in a very large number of systems. When this is the case, effectiveinference about one variable will usually depend upon consideration of other variables.In this chapter, we begin to deal with the first of two these, using multivariate regressionto deal with simple confounds and to take multiple measurements of influence. You’llsee how to include any arbitrary number of main effects in your linear model of the Gaussianmean. ese main effects are additive combinations of variables, the simplest type ofmultivariate model.We’ll focus on two valuable things multivariate models can help us with: (1) revealingspurious correlations like the Waffle House correlation with Divorce and (2) revealing importantcorrelations that may be masked by unrevealed correlations with other variables. Butmultiple predictor variables can hurt as much as they can help. So the chapter describes somedangers of multivariate models, notably multicollinearity. Along the way, you’ll meet CATE-GORICAL VARIABLES, which usually must be broken down into multiple predictor variables.Rethinking: Causal inference. Despite its central importance to science, there is no unified approachto causal inference yet in the sciences or in statistics. ere are even people who argue that causedoes not really exist; it’s just a psychological illusion. 63 About one thing, however, there is generalagreement: causal inference always depends upon unverifiable assumptions. Another way to say thisis that it’s always possible to imagine some way in which your inference about cause is mistaken, nomatter how careful the research design or statistical analysis. A lot can be accomplished, despite thisultimate barrier on inference. 645.1. Spurious associationLet’s leave waffles behind, at least for the moment. An example that is easier to understandis the correlation between divorce rate and marriage rate (FIGURE 5.2). e rate atwhich adults marry is a great predictor of divorce rate, as seen in the lehand plot in the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!