11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.7. PRACTICE 211Consider by analogy the Curse of Tippecanoe. 92 From the year 1840 until 1960, every UnitedStates president who was elected in a year ending in the digit 0 (which happens every 20 years, given4 year terms) has died in office. William Henry Harrison was the first, being elected in 1840 and dyingof pneumonia the next year. John F. Kennedy was the last, elected in 1960 and assassinated in 1963.Seven American presidents died in sequence in this pattern. Ronald Reagan was elected in 1980, butdespite at least one attempt on his life, he managed to live long aer his term was up, breaking thecurse. Given enough time and data, a pattern like this can be found for almost any body of data.But without any compelling reason to believe this pattern is meaningful, it is hardly compelling thatsuch patterns exist. Most large sets of data will contain patterns of correlation that are strong andsurprising. If we search hard enough, we are bound to find a Curse of Tippecanoe. ere are manyother patterns in presidential names and dates, and no doubt new ones are being found and circulatedall the time.Fiddling with and constructing many predictor variables is a great way to find coincidences, butnot necessarily a great way to evaluate hypotheses. However, fitting many possible models isn’t alwaysa dangerous idea, provided some judgment is exercised in weeding down the list of variables at thestart. ere are two scenarios in which this strategy appears defensible. First, sometimes all one wantsto do is explore a set of data, because there are no clear hypotheses to evaluate. is is rightly labeledpejoratively as DATA DREDGING, when one does not admit to it. But when used together with modelaveraging, and freely admitted, it can be a way to stimulate future investigation. Second, sometimeswe need to convince an audience that we have tried all of the combinations of predictors, becausenone of the variables seem to help much in prediction.6.6. In praise of complexityNeed to add discussion of reasons for using complex models, even when AIC/DIC recommendagainst it.6.7. PracticeAll three problems to follow use the same data. Pull out the old Howell !Kung demography dataand split it into two equally-sized data frames. Here’s the code to do it:library(rethinking)data(Howell1)d

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!