11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.2. INFORMATION THEORY AND MODEL PERFORMANCE 185the probabilities of shine and rain and produces a measure of uncertainty. What propertiesshould such a measure of uncertainty possess? ere are three intuitive desiderata.(1) e measure of uncertainty should be continuous. If it were not, then an arbitrarilysmall change in any of the probabilities, for example the probability of rain, wouldresult in a massive change in uncertainty.(2) e measure of uncertainty should increase as the number of possible events increases.For example, suppose there are two cities that need weather forecasts. Inthe first city, it rains on half of the days in the year and is sunny on the others. Inthe second, it rains, shines and hails, each on 1 out of every 3 days in the year. We’dlike our measure of uncertainty to be larger in the second city, where there is onemore kind of event to predict.(3) e measure of uncertainty should be additive. What this means is that if we firstmeasure the uncertainty about rain or shine (2 possible events) and then the uncertaintyabout hot or cold (2 different possible events), the uncertainty over the fourcombinations of these events—rain/hot, rain/cold, shine/hot, shine/cold—shouldbe the sum of the separate uncertainties.ere is only one function that satisfies these desiderata. is function is usually known asINFORMATION ENTROPY, and has a surprisingly simple definition. If there are n differentpossible events and each event i has probability p i , and we call the list of probabilities p, thenthe unique measure of uncertainty we seek is:n∑H(p) = − E log(p i ) = − p i log(p i ). (6.1)In plainer words:e uncertainty contained in a probability distribution is the average logprobabilityof an event.“Event” here might refer to a type of weather, like rain or shine, or a particular species ofbird or even a particular nucleotide in a DNA sequence. While it’s not worth going into thedetails of the derivation of H, it is worth pointing out that nothing about this function isarbitrary. Every part of it derives from the three requirements above.An example will help. To compute the information entropy for the weather, suppose thetrue probabilities of rain and shine are p 1 = 0.3 and p 2 = 0.7, respectively. en:i=1H(p) = − ( p 1 log(p 1 ) + p 2 log(p 2 ) ) ≈ 0.61Suppose instead we live in Abu Dhabi. en the probabilities of rain and shine might be morelike p 1 = 0.01 and p 2 = 0.99. Now the entropy would be approximately 0.06. Why has theuncertainty decreased? Because in Abu Dhabi it hardly ever rains. erefore there’s muchless uncertainty about any given day, compared to a place in which it rains 30% of the time.It’s in this way that information entropy measures the uncertainty inherent in a distributionof events. Similarly, if we add another kind of event to the distribution—forecasting intowinter, so also predicting snow—entropy tends to increase, due the added dimensionalityof the prediction problem. For example, suppose probabilities of sun, rain, and snow arep 1 = 0.7, p 2 = 0.15, and p 3 = 0.15, respectively. en entropy is about 0.82.ese entropy values by themselves don’t mean much to us, though. But we can use themto build a measure of accuracy. at comes next.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!