An Introduction to Statistical Extreme Value Theory - IMAGe

image.ucar.edu
  • No tags were found...

An Introduction to Statistical Extreme Value Theory - IMAGe

An Introduction toStatistical Extreme Value TheoryUli SchneiderGeophysical Statistics Project, NCARJanuary 26, 2004NCAR


OutlinePart I - Two basic approaches to extreme valuetheory – block maxima, threshold models.Part II - Uncertainty, dependence, seasonality,trends.Tutorial in Extreme Value Theory


FundamentalsIn classical statistics: model the AVERAGE behaviorof a process.Tutorial in Extreme Value Theory


FundamentalsIn extreme value theory: model the EXTREMEbehavior (the tail of a distribution).Tutorial in Extreme Value Theory


FundamentalsIn extreme value theory: model the EXTREMEbehavior (the tail of a distribution).Usually deal with very small data sets!Tutorial in Extreme Value Theory


Different ApproachesBlock Maxima (GEV)R th order statisticThreshold approach (GPD)Point processesTutorial in Extreme Value Theory


Block Maxima ApproachModel extreme daily rainfall in BoulderTake “block maximum” – maximum daily precipitationfor each year: M n = max{X 1 , . . . , X 365 }54 annual records (data points for M n ):Annual maximum of daily rainfall for Boulder (1948−2001)max. daily precip in 1/100 in100 200 300 4001950 1960 1970 1980 1990 2000yearsTutorial in Extreme Value Theory


Block Maxima ApproachThe distribution of M n = max{X 1 , . . . , X n } convergesto (as n → ∞)G(x) = exp{−[1 + ξ( x − µσ )]− 1 ξ }.G(x) is called the “Generalized Extreme Value”(GEV) distribution and has 3 parameters:shape parameter ξlocation parameter µscale parameter σ.Tutorial in Extreme Value Theory


Fitting a GEV – Estimating ParametersUse the 54 annual records to fit the GEV distribution.Estimate the 3 parameters ξ, µ and σ with “maximumlikelihood” (MLE) using statistical software (R).Get a GEV distribution with ξ = 0.09, µ = 50.16, andσ = 133.85.Density0.000 0.002 0.004 0.006100 200 300 400 500Tutorial in Extreme Value Theory


Fitting a GEV – Return LevelsOften of interest: return level z mP (M > z m ) = 1 m .Expect every m th observation to exceed the level z m .Or: at any point, there is a 1/m% probability to exceedthe level z m .Can be computed easily once the parameters are“known”.E.g. m = 100, then z 100 = 420, i.e. expect the annualdaily maximum to exceed 4.2 inches every 100 yearsin Boulder.Tutorial in Extreme Value Theory


Fitting a GEV – Return LevelsOften of interest: return level z mP (M > z m ) = 1 m .Expect every m th observation to exceed the level z m .Return Levels for Boulderm year return level0 100 200 300 4000 20 40 60 80 100m (years)Tutorial in Extreme Value Theory


Fitting a GEV – AssumptionsWe did not need to know what the underlyingdistribution of each X i , i.e. the daily total rainfall was.Underlying assumption: observations are “iid”independently andidentically distributed.Tutorial in Extreme Value Theory


Threshold ModelsModel exceedances over a high threshold u –X − u|X > u.Daily total rainfall for Boulder exceeding 80 (1/100 in).Allows to make more efficient use of the data.Daily total rainfall for Boulder (1948−2001)max. daily precip in 1/100 in0 100 200 300 4001960 1970 1980 1990 2000yearsTutorial in Extreme Value Theory


Threshold ModelsModel exceedances over a high threshold u –X − u|X > u.Daily total rainfall for Boulder exceeding 80 (1/100 in).Allows to make more efficient use of the data.Annual maximum of daily rainfall for Boulder (1948−2001)max. daily precip in 1/100 in100 200 300 4001950 1960 1970 1980 1990 2000yearsTutorial in Extreme Value Theory


Threshold ModelsModel exceedances over a high threshold u –X − u|X > u.Daily total rainfall for Boulder exceeding 80 (1/100 in).Allows to make more efficient use of the data.Daily total rainfall for Boulder (1948−2001)max. daily precip in 1/100 in0 100 200 300 4001960 1970 1980 1990 2000yearsTutorial in Extreme Value Theory


Threshold ModelsThe distribution of Y := X − u|X > u converges to(as u → ∞)H(y) = 1 − (1 + ξ ỹ σ )− 1 ξ .H(y) is called the “Generalized Pareto” distribution(GPD) with 2 parameters.shape parameter ξscale parameter ˜σ.The shape parameter ξ is the “same” parameter as in theGEV distribution.Tutorial in Extreme Value Theory


Fitting a GPD – Estimating ParametersUse the 184 exceedances over the threshold u = 80to fit the GEV distribution.Estimate the 2 parameters ξ and σ (using “maximumlikelihood” using statistical software (R).Get a GPD distribution with ξ = 0.22 and σ = 51.46.Density0.000 0.005 0.010 0.0150 50 100 150 200 250Tutorial in Extreme Value Theory


Fitting a GPD – Choosing a ThresholdDiagnostics: mean excess function – linear?Mean Excess−50 0 50 100 150 2000 100 200 300 400uTutorial in Extreme Value Theory


Fitting a GPD – Choosing a ThresholdDiagnostics: mean excess function – linear?Mean Excess−50 0 50 100 150 2000 100 200 300 400uTutorial in Extreme Value Theory


Fitting a GPD – Choosing a ThresholdDiagnostics: shape and modified scale – constant?Modified Scale0 500 100050 100 150 200 250 300ThresholdShape−3 −2 −1 0 150 100 150 200 250 300ThresholdTutorial in Extreme Value Theory


Fitting a GPD – Choosing a ThresholdAlternatively:Choose the threshold u so that a certain percentage ofthe data lies above it (robust and automatic, but is theapproximation valid?).Tutorial in Extreme Value Theory


Fitting a GPD – Return LevelsCompute 100-year return level for daily rainfall totalsusing the threshold approach: z 36500 = 429, i.e.expect the daily total to exceed 4.29 inches every 100years (36500 days).Return Levels for Boulderm year return level0 100 200 300 4000 20 40 60 80 100m (years)Tutorial in Extreme Value Theory


Uncertainty (GEV)Essentially, the maximum likelihood approach yieldsstandard errors for the estimates and thereforeconfidence bounds on the parameters.From the GEV (block maxima) fit for the yearlymaximum of daily precipitation for Boulder:ξ = 0.09, 95% conf. interval is (-0.1,0.28).σ = 50.16, 95% conf. interval is (38.77, 61.54).µ = 133.85, 95% conf. interval is (118.58,149.12).Tutorial in Extreme Value Theory


Uncertainty (GEV)Essentially, the maximum likelihood approach yieldsstandard errors for the estimates.These errors can be propagated to the return levels:Return Level100 200 300 400 500 6000.1 1 10 100 1000Return PeriodTutorial in Extreme Value Theory


Uncertainty (GPD)More data means less uncertainty.From the GPD (threshold model) fit for dailyprecipitation in Boulder:ξ = 0.22, 95% conf. interval is (-0.12,0.16).σ = 51.46, 95% conf. interval is (40.70, 62.21).Tutorial in Extreme Value Theory


Uncertainty (GPD)More data means less uncertainty.From the GPD (threshold model) fit for dailyprecipitation in Boulder:Return level100 200 300 4000.1 1 10 100 1000Return period (years)Tutorial in Extreme Value Theory


Dependence – DeclusteringFor the GEV and GPD approximations to be valid, weassume independence of the data.If the data is dependent, can use declustering to“make” them independent.E.g. pick only one (the max) point in a cluster thatexceeds a threshold.Tutorial in Extreme Value Theory


Dependence – DeclusteringAssume we want to make inference about hourlyprecipitation in Boulder.To decluster (instead of using 24 values for eachday), we select only the maximum daily (1-h) recordto fit the GPD model.daily 1h max. precip.0 50 100 1501950 1960 1970 1980 1990 2000timeTutorial in Extreme Value Theory


Dependence – (fitting the GPD)Choosing a threshold – mean excess function as adiagnostic:Mean Excess−50 0 50 100 150 2000 100 200 300 400uTutorial in Extreme Value Theory


Dependence – (fitting the GPD)Choosing a threshold – mean excess function as adiagnostic:Mean Excess−50 0 50 100 150 2000 100 200 300 400uTutorial in Extreme Value Theory


Dependence – (fitting the GPD)Modified Scale−100 0 50 15050 100 150 200ThresholdShape−0.4 0.0 0.4 0.850 100 150 200ThresholdTutorial in Extreme Value Theory


Dependence – (fitting the GPD)u = 75 seems to be a good threshold using thediagnostics.But u = 75 only leaves 28 data points above thethreshold.Use u = 35 instead (with 108 data points above thethreshold) to get the following estimates:ξ = −0.05, 95% conf. interval is (-0.27,0.15).σ = 27.98, 95% conf. interval is (19.94, 36.02).100-year return level is z m = 185, i.e. expect thehourly rainfall to exceed 1.85 inches every 100years (10-year level is 1.36 inches.)Tutorial in Extreme Value Theory


Dependence – (fitting the GPD)Use u = 35 (with 108 data points above thethreshold) to fit a GPD model.Return Levels for Boulderm year (hourly) return level0 50 100 1500 20 40 60 80 100m (years)Tutorial in Extreme Value Theory


Seasonalitydaily 1h max. precip.0 50 100 1500.3 0.4 0.5 0.6 0.7 0.8fraction of the yearTutorial in Extreme Value Theory


SeasonalityTo incorporate seasonality, link the scale parameterto covariates to describe the seasonal cycle.Use the covariates X 1 (t) = sin(2πf(t)) andX 2 (t) = cos(2πf(t)), where f(t) =fraction of the yearfor each day t.covariates−1.0 −0.5 0.0 0.5 1.00.3 0.4 0.5 0.6 0.7 0.8fraction of the yearTutorial in Extreme Value Theory


SeasonalityTo incorporate seasonality, link the scale parameterto covariates to describe the seasonal cycle.Use the covariates X 1 (t) = sin(2πf(t)) andX 2 (t) = cos(2πf(t)), where f(t) =fraction of the yearfor each day t.Use an exponential link function to link the covariatesto the scale parameter:Fit a GPD with density˜σ(t) = exp(β 0 + β 1 X 1 (t) + β 2 X 2 (t)).GP D (ξ, ˜σ(t) = exp(β 0 + β 1 X 1 (t) + β 2 X 2 (t))) .Tutorial in Extreme Value Theory

More magazines by this user
Similar magazines