09.03.2013 Views

Contents - Max-Planck-Institut für Physik komplexer Systeme

Contents - Max-Planck-Institut für Physik komplexer Systeme

Contents - Max-Planck-Institut für Physik komplexer Systeme

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.15 Probabilistic Forecasting<br />

JOCHEN BRÖCKER, STEFAN SIEGERT, HOLGER KANTZ<br />

A ubiquitous problem in our life (both as individuals<br />

as well as a society) is having to make decisions in the<br />

face of uncertainty. Forecasters are supposed to help in<br />

this process by making statements (i.e. forecasts) about<br />

the future course of events. In order to allow the forecast<br />

user to properly assess potential risks associated<br />

with various decisions, forecasters should, in addition,<br />

provide some information as to the uncertainty associated<br />

with their forecasts. Unequivocal or “deterministic”<br />

forecasts are often misleading as they give the false<br />

impression of high accuracy. Probabilities, in contrast,<br />

allow to quantify uncertainty in a well defined and consistent<br />

manner, if interpreted correctly.<br />

Forecasts in terms of probabilities have a long and<br />

successful history in the atmospheric sciences. In the<br />

1950’s, several meteorological offices started issuing<br />

probability forecasts, then based on synoptic information<br />

as well as local station data. On a scientific (non–<br />

operational) level, probabilistic weather forecasts were<br />

discussed even much earlier.<br />

Evaluation of probabilistic forecasts<br />

Since the prediction in a probabilistic forecast is a probability<br />

(distribution) whereas the observation is a single<br />

value, quantifying the accuracy of such forecasts is<br />

a nontrivial issue. This is of interest not only for quality<br />

control of operational forecasting schemes, but also for<br />

a more fundamental understanding of predictability of<br />

dynamical systems; see for example [5]. Nowadays,<br />

probabilistic weather forecasts are often issued over a<br />

long period of time under (more or less) stationary conditions,<br />

allowing archives of forecast–observation pairs<br />

to be collected. Such archives permit to calculate observed<br />

frequencies and to compare them with forecast<br />

probabilities. The probability distribution of the observation<br />

y, conditioned on our probability forecast being<br />

p, ideally coincides with p; a forecast having this property<br />

is called reliable or consistent. If a large archive of<br />

forecast–observation pairs is available, reliability can<br />

be tested statistically. It is not difficult though to produce<br />

reliable probabilistic forecasts. The overall climatological<br />

frequency for example will always be a reliable<br />

forecast; this constant forecast is not very informative<br />

though. The forecast should delineate different observations<br />

from each other.<br />

The question arises how virtuous forecast attributes<br />

(reliability and information content) can be quantified<br />

and evaluated. One therefore seeks for “scoring rules”<br />

which reward both reliable and informed forecasters.<br />

A scoring rule is a function S(p,y) where p is a probability<br />

forecast and y a possible observation. Here, y<br />

is assumed to be one of a finite number of labels, say<br />

{1...K}. A probability forecast would then consist<br />

of a vector p = (p1 ...pK) with pk ≥ 0 for all k, and<br />

<br />

k pk = 1. The idea is that S(p,y) quantifies how well<br />

p succeeded in forecasting y. The general quality of a<br />

forecasting system is ideally measured by the mathematical<br />

expectation E[S(p,y)], which can be estimated<br />

by the empirical mean<br />

E[S(p,y)] ∼ = 1<br />

N<br />

N<br />

S(p(n),y(n)) (1)<br />

n=1<br />

over a sufficiently large data set {(p(n),y(n));n =<br />

1...N} of forecast–observation pairs.<br />

Two important examples for scoring rules are the logarithmic<br />

score [8] (also called Ignorance)<br />

and the Brier score [1]<br />

S(p,y) = −log(py), (2)<br />

S(p,y) = <br />

(pk − δy,k) 2 . (3)<br />

k<br />

The convention here is that a smaller score indicates a<br />

better forecast.<br />

Although these two scoring rules seem ad hoc, they<br />

share an interesting property which arguably any scoring<br />

rule should possess in order to yield consistent results.<br />

Suppose that our forecast for some observation y<br />

is q, then the score of that forecast will be S(q,y). If the<br />

correct distribution of y is p, then the expectation value<br />

of our score is<br />

E(S(q,y)) = <br />

S(q,k)pk. (4)<br />

The right hand side is referred to as the scoring function<br />

s(q,p). Arguably, as p is the correct distribution of y,<br />

the expected score of q should be worse (i.e. larger in<br />

our convention) than the expected score of p. This is<br />

equivalent to requiring that the divergence function<br />

k<br />

d(q,p) := s(q,p) − s(p,p) (5)<br />

be non-negative and zero only if p = q. A scoring<br />

rule with this property (for all p,q) is called strictly<br />

proper [6, 7]. The divergence function of the Brier score<br />

for example is d(q,p) := <br />

k (qk − pk) 2 , demonstrating<br />

that this score is strictly proper. The Ignorance<br />

is proper as well, since (5) is then just the Kullback–<br />

Leibler–divergence, which is well known to be positive<br />

definite.<br />

The expectation value E[S(p,y)] with strictly proper<br />

scoring rule S allows for the following decomposition<br />

[4]:<br />

ES(p,y) = s(¯π, ¯π) + Ed(p,π) − Ed(¯π,π), (6)<br />

70 Selection of Research Results

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!