01.06.2015 Views

Actuarial Modelling of Claim Counts Risk Classification, Credibility ...

Actuarial Modelling of Claim Counts Risk Classification, Credibility ...

Actuarial Modelling of Claim Counts Risk Classification, Credibility ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

38 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

is called convergence in probability and is defined more formally as follows: A consistent<br />

estimator T j for some parameter j computed from a sample <strong>of</strong> size n is one for which<br />

lim<br />

n↗ PrT j − j ≥c = 0<br />

for all positive c. This will henceforth be denoted as T j → proba j as n ↗+. A consistent<br />

estimator is thus an estimator that converges to the population parameter as the sample size<br />

goes to infinity. Consistency is an asymptotic property.<br />

Asymptotic Normality<br />

Any estimator will vary across repeated samples. We must be able to calculate this<br />

variability in order to express our uncertainty about a parameter value and to make statistical<br />

inferences about the parameters. This variability is measured by the variance-covariance<br />

matrix <strong>of</strong> the estimators. This matrix provides the variances for each parameter on the<br />

main diagonal while the <strong>of</strong>f-diagonal elements estimate the covariances between all pairs <strong>of</strong><br />

parameters.<br />

The asymptotic variance-covariance matrix ̂<br />

for maximum likelihood estimators ̂ is<br />

the inverse <strong>of</strong> what is called the Fisher information matrix . Element ij <strong>of</strong> is given<br />

by<br />

[ ] [ ]<br />

<br />

2<br />

<br />

2<br />

−E ln L =−nE ln pN<br />

i j i 1 <br />

j<br />

[ <br />

= nE ln pN<br />

1 ]<br />

ln pN<br />

i 1 <br />

j<br />

∑<br />

= n pk ln pk ln pk<br />

i j<br />

k=0<br />

Thus, ̂<br />

= ( ) −1<br />

.<br />

An insight into why this makes sense is that the second derivatives measure the rate <strong>of</strong><br />

change in the first derivatives, which in turn determines the value <strong>of</strong> the maximum likelihood<br />

estimate. If the first derivatives are changing rapidly near the maximum, then the peak <strong>of</strong><br />

the likelihood is sharply defined and the maximum is easy to see. In this case, the second<br />

derivatives will be large and their inverse small, indicating a small variance <strong>of</strong> the estimated<br />

parameters. If on the other hand the second derivatives are small, then the likelihood function<br />

is relatively flat near the maximum and so the parameters are less precisely estimated. The<br />

inverse <strong>of</strong> the second derivatives will produce a large value for the variance <strong>of</strong> the estimates,<br />

indicating low precision <strong>of</strong> the estimates.<br />

The distribution <strong>of</strong> ̂ is usually difficult to obtain. Therefore we resort to the following<br />

asymptotic theory: Under mild regularity conditions (including that the true value <strong>of</strong> the<br />

parameter must be interior to the parameter space, that the log-likelihood function must be<br />

thrice differentiable, and that the third derivatives must be bounded) that are usually fulfilled,<br />

the maximum likelihood estimator ̂ has approximately in large samples a multivariate<br />

Normal distribution with mean equal to the true parameter and variance-covariance matrix<br />

given by the inverse <strong>of</strong> the information matrix.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!