11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

574 Personal reflections on the COPSS Presidents’ Award47.5 Serendipity with dataJust before David and I became colleagues, I had my first encounter withdata. This will seem funny to new researchers, but this was in the era of nopersonal computers and IBM punch cards.It was late 1976, I was the only Assistant Professor in the department, andIwassittinginmyofficehappilymindingmyownbusiness,whentwoverysenior and forbidding faculty members came to my office and said “Carroll,come here, we want you to meet someone” (yes, in the 1970s, people reallytalked like that, especially at my institution). In the conference room was avery distinguished marine biologist, Dirk Frankenberg (now deceased), whohad come over for a consult with senior people, and my colleagues said, ineffect, “Talk to this guy” and left. He was too polite to say “Why do I wantto talk to a 26-year old who knows nothing?” but I could tell that was whathe was thinking.Basically, Dirk had been asked by the North Carolina Department of Fisheries(NCDF) to build a model to predict the shrimp harvest in the PamlicoSound for 1977 or 1978, I forget which. The data, much of it on envelopes fromfishermen, consisted of approximately n = 12 years of monthly harvests, withroughly four time periods per year, and p = 3 covariates: water temperaturein the crucial estuary, water salinity in that estuary, and the river dischargeinto the estuary, plus their lagged versions. I unfortunately (fortunately?) hadnever taken a linear model course, and so was too naive to say the obvious:”You cannot do that, n is too small!” So I did.In current lingo, it is a “small p, small n,” the very antithesis of what ismeant to be modern. I suspect 25% of the statistical community today wouldscoff at thinking about this problem because it was not “small n, largep,” butit actually was a problem that needed solving, as opposed to lots of what isgoing on. I noticed a massive discharge that would now be called a high leveragepoint, and I simply censored it at a reasonable value. I built a model, andit predicted that 1978 (if memory serves) would be the worst year on record,ever (Hunt et al., 1980), and they should head to the hills. Dirk said “Are yousure?” and me in my naïveté said “yes,” and like a gambler, it hit: it was theterrible year. The NCDF then called it the NCDF model! At least in our reportwe said that the model should be updated yearly (my attempt at full employmentand continuation of the research grant), but they then fired us. Themodel did great for two more years (blind luck), then completely missed thefourth year, wherein they changed the title of the model to reflect where I wasemployed at the time. You can find Hunt et al. (1980) at http://www.stat.tamu.edu/~carroll/2012.papers.directory/Shrimp_Report_1980.pdf.This is a dull story, except for me, but it also had a moral: the data wereclearly heteroscedastic. This led me to my fascination with heteroscedasticity,which later led to my saying that “variances are not nuisance parameters”

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!