01.06.2015 Views

Actuarial Modelling of Claim Counts Risk Classification, Credibility ...

Actuarial Modelling of Claim Counts Risk Classification, Credibility ...

Actuarial Modelling of Claim Counts Risk Classification, Credibility ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

92 <strong>Actuarial</strong> <strong>Modelling</strong> <strong>of</strong> <strong>Claim</strong> <strong>Counts</strong><br />

starts as soon as some policy characteristics are modified (think for instance <strong>of</strong> a policyholder<br />

house moving for a company using postcode as rating factor, a policyholder’s wedding for<br />

a company using marital status, or simply the policyholder buying a new car). Moreover,<br />

in the year the policy is issued and in the one it is possibly cancelled the length <strong>of</strong> the<br />

observation period is generally less than unity.<br />

We face a nested structure: each policyholder generates a sequence N i =<br />

N i1 N i2 N iTi T <strong>of</strong> claim numbers. It is reasonable to assume independence between the<br />

series N 1 N 2 N n , but this assumption is very questionable inside the N i s. Regarding a<br />

priori ratemaking, the dependence between the components <strong>of</strong> each N i is a nuisance (in the<br />

statistical sense). This means that, at this stage, we are not interested in accurately modelling<br />

this dependence, but we must take it into account when estimating the regression coefficients.<br />

The idea now is to incorporate in the N it s exogenous information (like age, gender, power<br />

<strong>of</strong> the car, and so on) summarized in the vectors x it ; to this end, we resort to a regression<br />

model for longitudinal data.<br />

The distributional assumption for the random component <strong>of</strong> the regression model has to<br />

account for the non-negativity <strong>of</strong> the data, as well as their integer values. We begin with<br />

Poisson regression and assume that the N it s conform to the Poisson distribution with a<br />

mean that can be written as an exponential function <strong>of</strong> a linear combination 0 + ∑ p<br />

j=1 jx itj<br />

<strong>of</strong> the explanatory variables x it , with unknown regression coefficients to be estimated<br />

from the data. Despite its prevalence as a starting point in the analysis <strong>of</strong> count data, the<br />

Poisson specification is <strong>of</strong>ten inappropriate because <strong>of</strong> unobserved heterogeneity and failure<br />

<strong>of</strong> the independence assumption if the data consist in repeated observations on the same<br />

policyholders. A convenient way to take this phenomenon into account is to introduce a<br />

random effect into the model.<br />

Remark 2.5 Before embarking on a panel analysis pooling together the observations<br />

relating to several years, it is interesting to first work year by year to assess the stability <strong>of</strong><br />

the effect <strong>of</strong> each rating variable on the annual expected claim frequency. Specifically, the<br />

vector <strong>of</strong> the regression coefficients is estimated on the basis <strong>of</strong> each calendar year and the<br />

components are checked for their stability over time. Only stable coefficients are interesting<br />

for the purpose <strong>of</strong> ratemaking. Rating factors with unstable regression coefficients should be<br />

excluded from the risk classification scheme. In some cases, a time trend is visible for some<br />

estimated regression coefficients (this is typically true for the intercept 0 ). A time effect<br />

can then be incorporated into the model to account for coefficients with trends.<br />

2.9.2 Descriptive Statistics for Portfolio B<br />

The analysis in this section is based on an insurance portfolio containing 20 354 policies and<br />

observed during 3 years (from 1997 to 1999). We have 45 350 observations available as not<br />

all the policies have been in force for 3 years. For each policy and for each year, we know<br />

the exposure-to-risk, the number <strong>of</strong> claims filed and some other explanatory variables:<br />

Gender: Policyholder’s gender (male–female)<br />

Age: Policyholder’s age (18–22, 23–30 and over 30)<br />

Power: The power <strong>of</strong> the vehicle (less than 66 kW, 66–110 kW and more than 110 kW)<br />

Size: The size <strong>of</strong> the city where the policyholder lives (large, middle or small), and<br />

Colour: The colour <strong>of</strong> the vehicle (red or other).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!