11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.2. INFORMATION THEORY AND MODEL PERFORMANCE 1836.2.1.1. Costs and benefits. But it’s not hard to find another criterion, other than rate ofcorrect prediction, that makes the newcomer look foolish. Any consideration of costs andbenefits will suffice. Suppose for example that you hate getting caught in the rain, but you alsohate carrying an umbrella. Let’s define the cost of getting wet as −5 points of happiness andthe cost of carrying an umbrella as −1 points of happiness. Suppose your chance of carryingan umbrella is equal to the forecast probability of rain. Your job is now to maximize yourhappiness by choosing a weatherperson. Here are your points, following either the currentweatherperson or the newcomer:Day 1 2 3 4 5 6 7 8 9 10ObservedPointsCurrent −1 −1 −1 −0.6 −0.6 −0.6 −0.6 −0.6 −0.6 −0.6Newcomer −5 −5 −5 0 0 0 0 0 0 0So the current weatherperson nets you −7.2 happiness, while the newcomer nets you −15happiness. So the newcomer doesn’t look so clever now. You can play around with the costsand the decision rule, but since the newcomer always gets you caught unprepared in the rain,it’s not hard to beat his forecast.6.2.1.2. Measuring accuracy. But even if we ignore costs and benefits of any actual decisionbased upon the forecasts, there’s still ambiguity about which measure of “accuracy” toadopt. ere’s nothing special about “hit rate.” Consider for example computing the probabilityof predicting the exact sequence of days. is means computing the probability of acorrect prediction for each day. en multiply all of these probabilities together to get thejoint probability of correctly predicting the observed sequence. is is the same thing as thejoint likelihood, which you’ve been using up to this point to fit models with Bayes’ theorem.In this light, the newcomer looks even worse. e probability for the current weatherpersonis 1 3 × 0.4 7 ≈ 0.005. For the newcomer, it’s 0 3 × 1 7 = 0. So the newcomer has zeroprobability of getting the sequence correct. is is because the newcomer’s predictions neverexpect rain. So even though the newcomer has a high average probability of being correct(hit rate), he has a terrible joint probability of being correct.So which definition of “accuracy” should we use: joint probability, average probability,or even some other? We can answer this question by first answering two other questions.(1) What is a perfect prediction? Well, suppose for the moment that we could knowthe true probabilities of rain or shine on each day. (If the word “true” bugs you, seethe rethinking box further down.) at would make a valuable target, because inprinciple it would be unbeatable, if we knew it. So this answer provides a target.(2) How should we measure distance from the target? A perfect prediction would justreport the true probabilities of rain on each day. So when either weatherpersonprovides a prediction that differs from the target, we can measure the distance ofthe prediction from the target. But what distance should we adopt?is second question is the harder one. It’s important to get the answer right, and it’snot obvious how to go about answering it. One reason is that some targets are just easier tohit than other targets. We need a measure of distance that accounts for this. For example,suppose we extend the weather forecast into the winter. Now there are three types of days:rain, sun, and snow. Now there are three ways to be wrong, instead of just two. is has to be

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!