11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

158 5. MULTIVARIATE LINEAR MODELSI display this plot in FIGURE 5.10. Along the diagonal, the variables are labeled. In eachscatterplot off the diagonal, the vertical axis variable is the variable labeled on the same rowand the horizontal axis variable is the variable labeled in the same column. For example, thetwo scatterplots in the first row in FIGURE 5.10 are kcal.per.g (vertical) against perc.fat(horizontal) and then kcal.per.g (vertical) against perc.lactose (horizontal). Noticethat percent fat is positively correlated with the outcome, while percent lactose is negativelycorrelated with it. Now look at the rightmost scatterplot in the middle row. is plot is thescatter of percent fat (vertical) against percent lactose (horizontal). Notice that the pointsline up almost entirely along a straight line. ese two variables are negatively correlated,and so strongly so that they are nearly redundant. Either helps in predicting kcal.per.g,but neither helps much once you already know the other.You can compute the correlation between the two variables with cor:R code5.38cor( d$perc.fat , d$perc.lactose )[1] -0.9416373at’s a pretty strong correlation. How strong does a correlation have to get, before youshould start worrying about multicollinearity? ere’s no easy answer to that question. Correlationsdo have to get pretty high before this problem ruins your analysis. But what mattersisn’t just the correlation between a pair of variables. Rather, what matters the correlation thatremains aer accounting for any other predictors. But with only two predictors here, we canaddress the correlation question directly, with a little simulation experiment. Suppose wehave only kcal.per.g and perc.fat. Now we construct a random predictor variable, callit x, that is correlated with perc.fat at some predetermined level. en fit the regressionmodel that tries to predict kcal.per.g using both perc.fat and our random fake variablex. Record the standard error of the estimated effect of perc.fat. Now repeat this proceduremany times, at different levels of correlation between perc.fat and x.If you do this, you’ll get a plot like that in FIGURE 5.11. e vertical axis is the averagestandard deviation across 100 regressions, using a simulated correlated predictor variable x.e horizontal axis shows the intensity of correlation between x and perc.fat. When thetwo variables are uncorrelated, on the le side of the plot, then the standard deviation of theposterior is small. is means the posterior distribution is piled up a narrow range of values.As the correlation increases—and keep in mind that we aren’t adding any information here,just a correlated string of random numbers—the standard deviation inflates. But the effectis far from linear; it accelerates rapidly, as the correlation increases. Above a correlation of0.9, the standard deviation increases very rapidly, approaching in fact ∞ as the correlationapproaches 1. e code for producing this plot contains some techniques not yet explained,so I include it in an optional overthinking box at the end of this section.What can be done about multicollinearity? e best thing to do is be aware of it. Youcan anticipate this problem by checking the predictor variables against one another in a pairsplot. Any pair or cluster of variables with very large correlations, over about 0.9, may be problematic,once included as main effects in the same model. However, it isn’t always true thathighly-correlated variables are completely redundant—other predictors might be correlatedwith only one of the pair, and so help extract the unique information each predictor provides.So you can’t know just from a table of correlations nor from a matrix of scatterplots whethermulticollinearity will prevent you from including sets of variables in the same model. Still,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!