A cookbook, not a panacea, for claims reserving

A **cookbook**, **not**a **panacea**, **for** **claims****reserving**Rezervovacia kuchárka (GLM ako aperitív, hlavné menu po bayesovsky, GEE ako digestív)Michal PeštaCharles University in PragueFaculty of Mathematics and PhysicsActuarial Seminar, Prague12 October 2012

Overview- Chain ladder models- GLM in stochastic **reserving**- Nonparametric smoothing models in **reserving**- Bayesian **reserving** models- GEE in stochastic **reserving**• Support:Czech Science Foundation project “DYME DynamicModels in Economics” No. P402/12/G097

Terminology- X i,j . . . claim amounts in development year j withaccident year i- X i,j stands **for** the incremental **claims** in accidentyear i made in accounting year i + j- n . . . current year – corresponds to the mostrecent accident year and development period- Our data history consists ofright-angled isosceles triangles X i,j , wherei = 1, . . . , n and j = 1, . . . , n + 1 − i

Run-off (incremental)triangleAccidentDevelopment year jyear i 1 2 · · · n − 1 n1 X 1,1 X 1,2 · · · X 1,n−1 X 1,n2 X 2,1 X 2,2 · · · X 2,n−1. ..... X i,n+1−in − 1 X n−1,1 X n−1,2n X n,1

Notation- C i,j . . . cumulative payments in origin year i after jdevelopment periodsC i,j =j∑k=1X i,k- C i,j . . . a random variable of which we have anobservation if i + j ≤ n + 1- Aim is to estimate the ultimate **claims** amount C i,nand the outstanding **claims** reserveR i = C i,n − C i,n+1−i ,i = 2, . . . , nby completing the triangle into a square

Run-off (cumulative)triangleAccidentDevelopment year jyear i 1 2 · · · n − 1 n1 C 1,1 C 1,2 · · · C 1,n−1 C 1,n2 C 2,1 C 2,2 · · · C 2,n−1. ..... C i,n+1−in − 1 C n−1,1 C n−1,2n C n,1

Chain ladder[1] E[C i,j+1 |C i,1 , . . . , C i,j ] = f j C i,j[2] Var[C i,j+1 |C i,1 , . . . , C i,j ] = σ 2 j Cα i,j , α ∈ R[3] Accident years [C i,1 , . . . , C i,n ] are independentvectors

Development factors(link ratios) f j∑ n−ĵf (n)j=i=1 C1−α i,jC i,j+1∑ n−ji=1 C2−α i,ĵf (n)n ≡ 1 (assuming no tail), 1 ≤ j ≤ n − 1

Mack or linearregression- α = 0 . . . linear regression (no intercept,homoscedastic) **for** [C •,j , C •,j+1 ] satisfies CL- α = 1 . . . Mack (1993), but also the Aitken (nointercept, heteroscedastic) regression model withweights C −1i,j- smoothing (and extrapolation) of developmentfactors possible

Ultimates and reserves- Ultimate **claims** amounts C i,n are estimated byĈ i,n = C i,n+1−i ×(n)(n)̂fn+1−i× · · · × ̂f n−1- Reserves R i are, thus, estimated by( )̂R i = Ĉi,n(n)(n)− C i,n+1−i = C i,n+1−i ̂fn+1−i× · · · × ̂f n−1 − 1

Generalized LinearModels- a flexible generalization of ordinarylinear regression- **for**mulated by John Nelder and RobertWedderburn as a way of unifying various otherstatistical models, including linear regression,logistic regression and Poisson regression

GLM: 3 elements1. random component: outcome of thedependent variables Y from the exponential family,i.e.,f Y (y; θ, φ) = exp {[yθ − b(θ)]/a(φ) + c(y, φ)}where θ is canonical parameter, φ isdispersion parameter and EY i = µ i2. systematic component: linear predictor (meanstructure)η = Xβ3. link: function gη i = g(µ i )

Exponential family- include many of the most common distributions,including the normal, exponential, gamma,chi-squared, beta, Dirichlet, Bernoulli, categorical,Poisson, Wishart, Inverse Wishart and manyothers- a number of common distributions are exponentialfamilies only when certain parameters areconsidered fixed and known, e.g., binomial (withfixed number of trials), multinomial (with fixednumber of trials), and negative binomial (with fixednumber of failures)- common distributions that are**not** exponential families are Student’s t, mostmixture distributions, and even the family ofuni**for**m distributions with unknown bounds

Canonical link –sufficient statistic- exponential familyEY = µ = b ′ (θ),VarY = b ′′ (θ)a(φ) ≡ V (µ)ã(φ)- distribution ←→ link function (sufficient statistic←→ canonical link)◮ normal . . . identity: µ i = X i,• β◮ gamma (exponential) . . . inverse (reciprocal):µ −1i = X i,• β◮ Poisson . . . logarithm: log(µi ) = X i,• β( )◮ binomial (multinomial) . . . logit: logµi1−µ i= X i,• β◮ inverse Gaussian . . . reciprocal squared: µ−2i = X i,• β

Link functions- logit η = log{µ/(1 − µ)}- probit η = Φ −1 (µ)- complementary log-log η = log{− log(1 − µ)}- power family of linksη ={ (µ λ − 1)/λ, λ ≠ 0,log µ, λ = 0;or η ={ µ λ , λ ≠ 0,log µ, λ = 0.

Estimation- estimation of the parameters viamaximum likelihood, quasi-likelihood orBayesian techniques

N(µ, σ 2 )- support (−∞, +∞)- dispersion parameter φ = σ 2- cumulant function b(θ) = θ 2 /2( )- c(y, φ) = − 1 y 22 φ + log(2πφ)- µ(θ) = E θ Y = θ- canonical link θ(µ): identity- variance function V (µ) = 1

P o(µ)- support {0, 1, 2, . . .}- dispersion parameter φ = 1- cumulant function b(θ) = exp{θ}- c(y, φ) = − log y!- µ(θ) = E θ Y = exp{θ}- canonical link θ(µ): log- variance function V (µ) = µ

Γ(µ, ν)- support (0, +∞)- VarY = µ 2 ν- dispersion parameter φ = ν −1- cumulant function b(θ) = − log{−θ}- c(y, φ) = ν log(νy) − log y − log Γ(ν)- µ(θ) = E θ Y = −1/θ- canonical link θ(µ): reciprocal- variance function V (µ) = µ 2

Mack’s model as GLM- re**for**mulate Mack’s model as a model of ratios[ ][ ∣ ]Ci,j+1Ci,j+1 ∣∣∣E = f j and Var C i,1 , . . . , C i,j = σ2 jC i,j C i,j C i,j- conditional weighted normal GLM( )C i,j+1∼ N f j , σ2 jC i,jC i,j- Mack’s model was **not** derived/designed as a GLM,but a conditional weighted normal GLM gives thesame estimates

GLM **for** triangles- independent incremental **claims** X ij , i + j ≤ n + 1◮ overdispersed Poisson distributed XijE[X ij ] = µ ij and Var[X ij ] = φµ ij◮Gamma distributed X ij- logarithmic link functionE[X ij ] = µ ij and Var[X ij ] = φµ 2 ijlog(µ ij ) = γ + α i + β j , α 1 = β 1 = 0

GLM **for** triangles II- overdispersed Poisson with log link providesasymptotically same parameter estimates, predictedvalues and prediction errors- possible extensions:◮ Hoerl curvelog(µ ij ) = γ + α i + β j log(j) + δ j j◮smoother (semiparametric)log(µ ij ) = γ + α i + s 1 (log(j)) + s 2 (j)

Estimation in triangles- ML (maximum likelihood) . . . likelihoodL(θ, φ; X) =n∏i=1n+1−i∏j=1f(X ij ; θ ij , φ)- maximize log-likelihood w.r.t. parameters of µ,which is an argument of θ, i.e., θ(µ(α, β))! there is no overdispersed Poisson distribution(only if thinking of negative binomial)- QML (quasi-maximum likelihood). . . quasi-likelihood**for** ODPlog Q(µ; X) =n∑i=1n+1−i∑j=1φ(X ij log µ ij − µ ij ) + const

Generalized additivemodels- GAM . . . extension of GLM, with the linearpredictor being replaced by a non-parametricsmootherp∑η ij = s k (X ij )k=1- s(x) represents a non-parametric smoother on x,which may be chosen from several different typesof smoother, such as locally weighted regressionsmoothers (loess), cubic smoothing splines andkernel smoothers

GAM- Ex: smoothing (trade-off between smoothnessand fit) **for** univariate (p = 1) cubic spline withnormal distribution⎧⎫⎨∑∫ ⎬min [X ij − s(X ij )] 2 + λ [s ′′ (t)] 2 dt⎩⎭i,j

Bayesian approach- problem of instability in the proportion ofultimate **claims** paid in the early development years,causing a method such as the CL to produceunsatisfactory results when applied mechanically- to stabilize the results using anexternal initial estimate of ultimate **claims**

Bornhuetter-Ferguson- reminder from CL: outstanding **claims**R i = C i,n+1−i (f n+1−i × . . . × f n−1 − 1)Assumptions(1) E[C i,j+k |C i,1 , . . . , C i,j ] = C i,j + (β j+k − β j )µ i andE[C i,1 ] = β 1 µ iβ j > 0, µ i > 0, β n = 11 ≤ i ≤ n, 1 ≤ j ≤ n, 1 ≤ k ≤ n − j(2) accident years [C i,1 , . . . , C i,n ], 1 ≤ i ≤ n areindependent- estimate ĈBF i,n **for** ultimates- Bayesian approach to CL

Bornhuetter-FergusonMethod- a very robust method since it does **not** consideroutliers in the observationsImplied Assumptions(1) E[C i,j ] = β j µ iβ j > 0, µ i > 0, β n = 11 ≤ i ≤ n, 1 ≤ j ≤ n(2) accident years [C i,1 , . . . , C i,n ], 1 ≤ i ≤ n areindependent- "implied"assumptions are weaker than the originalBF assumptions (and, hence, **not** equivalent)

BF Estimator- BF estimator of ultimate from the latestĈ BFi,n= C i,n−i+1 + (1 − ̂β n−i+1 )̂µ i- comparing CL and BF model ∏ n−1role of β j and, there**for**e,k=j f −1kplays thêβ j =n−1∏k=j1̂f k- need a prior estimate **for** µ i- ̂µ i is often a plan value from a strategic businessplan or the value used **for** premium calculations- ̂µ i should be estimated be**for**e one has anyobservations (i.e., should be a pure prior estimatebased on expert opinion) !

Comparison of BF andCL Estimators- if the prior estimate of ultimates µ i is equal tothe CL estimate of ultimates, then the BF and CLestimators underline- BF:- CL:Ĉ BFi,nĈ CLi,n= C i,n−i+1 + (1 − ̂β n−i+1 )̂µ i= C i,n−i+1 + (1 − ̂β n−i+1 )ĈCL i,n- BF differs from the CL in that the CL estimateof ultimate **claims** is replaced by an alternativeestimate based on external in**for**mation andexpert judgement

Bayesian Models- a prior distribution **for** row (ultimate) parameter- Ex: ODP model, where an obvious candidate isµ i ∼ independent Γ(γ i , δ i )such thatEµ i = γ iδ i

Predictive distribution- posterior predictive distribution of incremental**claims** X i,j is an over-dispersednegative binomial distribution with mean[Z i,j C i,j−1 + (1 − Z i,j ) γ ]i 1(f j−1 − 1)δ i f j−1 × . . . × f n−1whereZ i,j =1f j−1 ×...×f n−1δ i φ +1f j−1 ×...×f n−1

Credibility **for**mula- a natural trade-off between two competingestimates **for** X i,jC i,j−1andγ i 11= E[µ i ]δ i f j−1 × . . . × f n−1 f j−1 × . . . × f n−1- Bayesian model has the CL as one extreme(no prior in**for**mation about the row parameters)and the BF as the other (perfect priorin**for**mation about the row parameters)

Bayesian trade-off- BF assumes that there is perfect priorin**for**mation about the row parameters (does **not**use the data at all **for** one part of estimation). . . heroic assumption- prefer to use something between BF and CL- credibility factor Z i,j governs the trade-offbetween the prior mean and the data- the further through the development we are, the1largerf j−1 ×...×f n−1becomes, and the more weight isgiven to the CL ladder estimate- choice of δ i is governed by the prior precision ofthe initial estimate **for** ultimates . . . with regardgiven to the over-dispersion parameter (e.g.,an initial estimate from the ODP)

Cape Cod- a**not**her Bayesian-like approachAssumptions(1) E[C i,j ] = κπ i β jκ > 0, π i > 0, β j > 0, β n = 11 ≤ i ≤ n, 1 ≤ j ≤ n, 1 ≤ k ≤ n − j(2) accident years [C i,1 , . . . , C i,n ], 1 ≤ i ≤ n areindependent- estimate ĈCC i,n**for** ultimates- equivalent to implied assumptions of BF withµ i = κπ i

Cape Cod Estimator- main deficiency of DFM (CL) . . . ultimate **claims**completely depend on the latest diagonal **claims** **not** robust (sensitive to outliers)- moreover, in long-tailed LoBs (e.g., liability) thefirst observation is **not** representative- one possibility is to smoothen outliers from thelatest (diagonal) combine BF and CL intoBenktander-Hovinen method- a**not**her way is to make the diagonal **claims** morerobust Cape Cod method- CC estimator of ultimate from the latestĈ CCi,n= C i,n−i+1 − ĈCC i,n−i+1 +n−1∏j=n−i+1f j Ĉ CCi,n−i+1

Generalised Cape Cod- π i can be interpreted as the premium received **for**accident year i- κ reflects the average loss ratio- loss ratio **for** an accident year using the CLestimate **for** the ultimate claim̂κ i =Ĉ CLi,nπ i=C i,n−i+1∏ n−1j=n−i+1 f jπ i= C i,n−i+1β n−i+1 π i- initial expected ratio may be set to the same valuederived from an overall weighted ("robusted")average ratio (simple CC method)̂κ CC =n∑i=1β n−i+1 π i∑ nk=1 β n−k+1π k̂κ i =∑ ni=1 C i,n−i+1∑ ni=1 β n−i+1π i

Generalised Cape Cod II- robusted value **for** latest (diagonal)Ĉ i,n−i+1 = ̂κ CC π i β n−i+1- in the CC method, the CL iteration is applied tothe robusted diagonalĈ CCi,n= C i,n−i+1 + (1 − β n−i+1 )̂κ CC π i- modification of a BF type with modified a prior̂κ CC π i- generalised CC with decay factor 0 ≤ d ≤ 1 constant ̂κ CC is replaced with [̂κ CC1 , . . . , ̂κ CCn ] **for**different accident yearŝκ CCi= d̂κ CC + (1 − d)̂κ i- factor of 0 . . . no smoothing (CL); factor of 1. . . constant initial ratio (Simple CC)

Generalized estimatingequations- classical approaches to **claims** **reserving** problem arebased on the limiting assumption that the **claims** indifferent years are independent variables- dependencies in the development years classicaltechniques provide incorrect prediction- no distributional assumptions (avoiding distributionmisspecification)

GEE- run-off triangles as one of the most typical typeof actuarial data comprisecorrelated longitudinal data (or generallyclustered data), where an accident yearcorresponds to a subject- **claims** within “subject” should be considered ascorrelated by nature- incremental **claims** **for** accident year i ∈ {1, . . . , n}create a (n − i + 1) × 1 vector X i = [X i,1 , . . . , X i,n−i+1 ] ⊤and define their expectationsEX i = µ i = [µ i,1 , . . . , µ i,n+1−i ] ⊤

Link function and linearpredictor- accident year i and development year j influencethe expectation of claim amount via so-calledlink function g in the following manner:µ i,j = g −1 (z ⊤ i,jθ),where g −1 is referred to as the inverse of scalarlink function g and z i,j is a p × 1 vector of(fictional) covariates that arranges theimpact of accident and development year on theclaim amount through model parameters θ ∈ R p×1

Example of link- e.g., the Hoerl curve with the logarithmic linkfunction can be “coded” by design matrixz i,j = [1, δ 1,i , . . . , δ n,i , 1 × δ 1,j , . . . , n × δ n,j ,and parameters of interestδ 1,j × log 1, . . . , δ n,j × log n] ⊤θ = [γ, α 1 , . . . , α n , β 1 , . . . , β n , λ 1 , . . . , λ n ] ⊤ ,where δ i,j corresponds to the Kronecker’s delta- afterwardslog(µ i,j ) = γ + α i + jβ j + λ j log j

Variance- besides link function g and linear predictor z ⊤ i,j θ (i.e.,mean structure µ i ), one needs to specify thevariance of claim amounts- suppose that the variance of incremental **claims**can be expressed asa known function of of their expectationsVarX i,j = φh(µ i,j ),where φ > 0 is a scale (dispersion) parameter

Working correlationmatrix- in the GEE framework, it is **not** necessary toknow the whole distribution of the response (e.g.,a distribution of the incremental **claims**) like in theGLM setup- sufficient to specify the variance of X i,j and theworking correlation matrixR i (ϑ) ∈ R (n−i+1)×(n−i+1)**for** incremental **claims** in each accident year

Working correlationmatrix II- correlation matrix differs from accident year toaccident year- however, each correlation matrix depends only onthe s × 1 vector of unknown parameters ϑ, whichis the same **for** all the accident years- consequently, the working covariance matrix ofthe incremental **claims** isCovX i = φA 1/2iR i (ϑ)A 1/2i,where A i is an (n − i + 1) × (n − i + 1) diagonal matrixwith h(µ i,j ) as the jth diagonal element- the name “working” comes from the fact that it is**not** expected to be correctly specified

Choice of workingcorrelation matrix- the simplest case is to assume uncorrelatedincremental **claims**, i.e.,R i (ϑ) = I n−i+1 = {δ j,k } n−i+1,n−i+1j,k=1- opposite extreme case is an unstructuredcorrelation matrixR i (ϑ) = {ϑ j,k } n−i+1,n−i+1j,k=1such that ϑ j,j = 1 **for** j = 1, . . . , n − 1 + 1 and R i (ϑ) ispositive definite

Choice of workingcorrelation matrix II- somewhere in between, there lies an exchangeablecorrelation structureR i (ϑ) = {δ j,k + (1 − δ j,k )ϑ} n−i+1,n−i+1j,k=1, ϑ = [ϑ, . . . , ϑ] ⊤- an m-dependentR i (ϑ) = {r j,k } n−i+1,n−i+1j,k=1, r j,k =⎧⎨⎩1, j = k,ϑ |j−k| , 0 < |j − k| ≤ m, ϑ0, |j − k| > m- an autoregresive AR(1) correlation structureR i (ϑ) = {ϑ |j−k| } n−i+1,n−i+1j,k=1, ϑ = [ϑ, . . . , ϑ] ⊤ .

Quasi-likelihood- parameter estimation in the GEE framework isper**for**med in a way that the theoreticalquasi-likelihood∫ x − µQ(x; µ) =h(µ) dµis used instead of the true log-likelihood function- quasi-likelihood estimate in GEE setup is thesolution of the score-like equation systemn∑i=1[ ∂µi∂θ] ⊤φ −1 A −1/2iR −1i(ϑ)A −1/2i(X i − µ i ) = 0 ∈ R p ,where [∂µ i /∂θ] is a (n − i + 1) × p matrix of partialderivatives of µ i with respect to the unknownparameters θ

Further work- **claims** generating process: incremental paid **claims**X i,j to be the sum of N i,j (independent) **claims** ofamount Y ki,j , k = 1, . . . , N i,j- Wright’s model- Tweedie compound distribution

Conclusions- CL- GLM- GAM- BF- Bayesian framework- CC- GEE

ReferencesEngland, P. D. and Verrall, R. J. (2002)Stochastic **claims** **reserving** in general insurance(with discussion).British Actuarial Journal, 8 (3).Hudecova, S. and Pesta, M. (2012)Generalized estimating equations in **claims****reserving**.In Komarek, A. and Nagy, S., editors, Proceedings ofthe 27th International Workshop on StatisticalModelling, Vol. 2, p. 555–560, Prague.Wüthrich, M., Merz, M. (2008)Stochastic **claims** **reserving** methods in insurance.Wiley finance series. John Wiley & Sons.

Thank you !pesta@karlin.mff.cuni.cz