11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5 Multivariate Linear ModelsOne of the most reliable sources of waffles in North America is a Waffle House diner.Waffle House is nearly always open, even just aer a hurricane, as most diners invest in disasterpreparedness. e United States’ disaster relief agency (FEMA) informally uses WaffleHouse as an index of disaster severity. 61 If the Waffle House is closed, that’s a serious event.at’s all nice, but when you inspect the correlation between Waffle House and the stabilityof marriages in a region, it looks like waffles are causing some disasters (FIGURE 5.1).States with many Waffle Houses per person, like Georgia and Alabama, also have some ofthe highest divorce rates in the United States. e lowest divorce rates are found where thereare zero Waffle Houses.Could always-available waffles and hash brown potatoes put marriage at risk? Probablynot. is is an example of a misleading correlation. No one thinks there is any plausiblemechanism by which Waffle House diners make divorce more likely. Instead, when we seea correlation of this kind, we immediately start asking about other variables that are “really”driving the relationship between waffles and divorce. In this case, Waffle House began inGeorgia in the year 1955. Over time, the diners spread across the Southern United States,remaining largely bounded within it. So Waffle House is associated with the South. Divorceis not a uniquely Southern institution, but is more common anyplace that people marryyoung, and many communities in the South still frown on young people “shacking up” andliving together out of wedlock. So it’s probably just an accident of history that Waffle Houseand high divorce rates both occur in the South.In other contexts, however, it’s easy to be fooled by such spurious correlations. is isone reason why so much statistical effort is devoted to MULTIVARIATE REGRESSION, usingmore than one predictor variable to model an outcome. Reasons oen given for multivariatemodels include:(1) Statistical “control” for confounds. A confound is a variable that may be correlatedwith another variable of interest. e spurious waffles and divorce correlation isone possible type of confound, where the confound (Southernness) makes a variablewith no real importance (Waffle House density) appear to be important. Butconfounds can hide real important variables just as easily as they can produce falseones. In a particularly important type of confound, known as SIMPSON’S PARA-DOX, the entire direction of an apparent association between a predictor and outcomecan be reversed by considering a confound. 62(2) Multiple causation. Even when confounds are absent, due for example to tightexperimental control, a phenomenon may really arise from multiple causes. Measurementof each cause is useful, so when we can use the same data to estimate more129

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!