Contents - Max-Planck-Institut für Physik komplexer Systeme
Contents - Max-Planck-Institut für Physik komplexer Systeme
Contents - Max-Planck-Institut für Physik komplexer Systeme
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2.15 Probabilistic Forecasting<br />
JOCHEN BRÖCKER, STEFAN SIEGERT, HOLGER KANTZ<br />
A ubiquitous problem in our life (both as individuals<br />
as well as a society) is having to make decisions in the<br />
face of uncertainty. Forecasters are supposed to help in<br />
this process by making statements (i.e. forecasts) about<br />
the future course of events. In order to allow the forecast<br />
user to properly assess potential risks associated<br />
with various decisions, forecasters should, in addition,<br />
provide some information as to the uncertainty associated<br />
with their forecasts. Unequivocal or “deterministic”<br />
forecasts are often misleading as they give the false<br />
impression of high accuracy. Probabilities, in contrast,<br />
allow to quantify uncertainty in a well defined and consistent<br />
manner, if interpreted correctly.<br />
Forecasts in terms of probabilities have a long and<br />
successful history in the atmospheric sciences. In the<br />
1950’s, several meteorological offices started issuing<br />
probability forecasts, then based on synoptic information<br />
as well as local station data. On a scientific (non–<br />
operational) level, probabilistic weather forecasts were<br />
discussed even much earlier.<br />
Evaluation of probabilistic forecasts<br />
Since the prediction in a probabilistic forecast is a probability<br />
(distribution) whereas the observation is a single<br />
value, quantifying the accuracy of such forecasts is<br />
a nontrivial issue. This is of interest not only for quality<br />
control of operational forecasting schemes, but also for<br />
a more fundamental understanding of predictability of<br />
dynamical systems; see for example [5]. Nowadays,<br />
probabilistic weather forecasts are often issued over a<br />
long period of time under (more or less) stationary conditions,<br />
allowing archives of forecast–observation pairs<br />
to be collected. Such archives permit to calculate observed<br />
frequencies and to compare them with forecast<br />
probabilities. The probability distribution of the observation<br />
y, conditioned on our probability forecast being<br />
p, ideally coincides with p; a forecast having this property<br />
is called reliable or consistent. If a large archive of<br />
forecast–observation pairs is available, reliability can<br />
be tested statistically. It is not difficult though to produce<br />
reliable probabilistic forecasts. The overall climatological<br />
frequency for example will always be a reliable<br />
forecast; this constant forecast is not very informative<br />
though. The forecast should delineate different observations<br />
from each other.<br />
The question arises how virtuous forecast attributes<br />
(reliability and information content) can be quantified<br />
and evaluated. One therefore seeks for “scoring rules”<br />
which reward both reliable and informed forecasters.<br />
A scoring rule is a function S(p,y) where p is a probability<br />
forecast and y a possible observation. Here, y<br />
is assumed to be one of a finite number of labels, say<br />
{1...K}. A probability forecast would then consist<br />
of a vector p = (p1 ...pK) with pk ≥ 0 for all k, and<br />
<br />
k pk = 1. The idea is that S(p,y) quantifies how well<br />
p succeeded in forecasting y. The general quality of a<br />
forecasting system is ideally measured by the mathematical<br />
expectation E[S(p,y)], which can be estimated<br />
by the empirical mean<br />
E[S(p,y)] ∼ = 1<br />
N<br />
N<br />
S(p(n),y(n)) (1)<br />
n=1<br />
over a sufficiently large data set {(p(n),y(n));n =<br />
1...N} of forecast–observation pairs.<br />
Two important examples for scoring rules are the logarithmic<br />
score [8] (also called Ignorance)<br />
and the Brier score [1]<br />
S(p,y) = −log(py), (2)<br />
S(p,y) = <br />
(pk − δy,k) 2 . (3)<br />
k<br />
The convention here is that a smaller score indicates a<br />
better forecast.<br />
Although these two scoring rules seem ad hoc, they<br />
share an interesting property which arguably any scoring<br />
rule should possess in order to yield consistent results.<br />
Suppose that our forecast for some observation y<br />
is q, then the score of that forecast will be S(q,y). If the<br />
correct distribution of y is p, then the expectation value<br />
of our score is<br />
E(S(q,y)) = <br />
S(q,k)pk. (4)<br />
The right hand side is referred to as the scoring function<br />
s(q,p). Arguably, as p is the correct distribution of y,<br />
the expected score of q should be worse (i.e. larger in<br />
our convention) than the expected score of p. This is<br />
equivalent to requiring that the divergence function<br />
k<br />
d(q,p) := s(q,p) − s(p,p) (5)<br />
be non-negative and zero only if p = q. A scoring<br />
rule with this property (for all p,q) is called strictly<br />
proper [6, 7]. The divergence function of the Brier score<br />
for example is d(q,p) := <br />
k (qk − pk) 2 , demonstrating<br />
that this score is strictly proper. The Ignorance<br />
is proper as well, since (5) is then just the Kullback–<br />
Leibler–divergence, which is well known to be positive<br />
definite.<br />
The expectation value E[S(p,y)] with strictly proper<br />
scoring rule S allows for the following decomposition<br />
[4]:<br />
ES(p,y) = s(¯π, ¯π) + Ed(p,π) − Ed(¯π,π), (6)<br />
70 Selection of Research Results