24.01.2013 Views

Rob van Hest Capture-recapture Methods in Surveillance - RePub ...

Rob van Hest Capture-recapture Methods in Surveillance - RePub ...

Rob van Hest Capture-recapture Methods in Surveillance - RePub ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Methodology of capture-<strong>recapture</strong> analysis<br />

developed for capture-<strong>recapture</strong> analysis by Fienberg. 3 With three registers there are eight<br />

possible comb<strong>in</strong>ations of these registers <strong>in</strong> which cases do or do not appear. The general<br />

model uses eight parameters, the common parameter (the logarithm of the number<br />

expected to be <strong>in</strong> all lists), three ‘ma<strong>in</strong> effects’ parameters (the log odds ratios aga<strong>in</strong>st<br />

appear<strong>in</strong>g <strong>in</strong> each list for cases who appear <strong>in</strong> the others), three ‘two-way <strong>in</strong>teractions’ or<br />

second order effect parameters (the log odds ratios between pairs of lists for cases who<br />

appear <strong>in</strong> the other), and a ‘three-way’ <strong>in</strong>teraction parameter. For three registers, A with i<br />

levels, B with j levels, C with k levels, the natural logarithm (ln or loge) of expected<br />

frequency F ijk for cell ijk, ln F ijk, can be denoted as<br />

A B C AB AC BC ABC<br />

lnFijk<br />

= θ + λi<br />

+ λ j + λk<br />

+ λij<br />

+ λik<br />

+ λ jk + λijk<br />

whereθ is the common parameter , λ A , λ B , and λ C are the ma<strong>in</strong> effect parameters, λ AB ,<br />

λ AC and λ BC are the second order effect (two-way <strong>in</strong>teraction) parameters and λ ABC is the<br />

highest order effect (three-way <strong>in</strong>teraction) parameter. The value of this last three-way<br />

<strong>in</strong>teraction parameter can not be tested from the study data and is assumed to be zero.<br />

Assumptions about the other parameters can be tested, although these tests may not be<br />

very powerful for small samples.<br />

Three types of log-l<strong>in</strong>ear models can be recognised. Firstly, the ‘<strong>in</strong>dependent<br />

model’ which assumes that all registers are <strong>in</strong>dependent. Secondly, models that are<br />

equivalent to two <strong>in</strong>dependent registers or two <strong>in</strong>dependent subsets of registers. F<strong>in</strong>ally, a<br />

‘saturated’ model that <strong>in</strong>corporates all possible <strong>in</strong>teractions, <strong>in</strong>clud<strong>in</strong>g possible three-way<br />

<strong>in</strong>teraction. To assess how the various log-l<strong>in</strong>ear models fit the data (model fitt<strong>in</strong>g) the log<br />

likelihood-ratio test, also known as G 2 or deviance, is used, denoted as<br />

(2.4)<br />

G 2 = -2∑Obs j ln[Obs j /Exp ji] (2.5)<br />

where Obs j is the observed number of <strong>in</strong>dividuals <strong>in</strong> each cell j, and Exp ji is the expected<br />

number of <strong>in</strong>dividuals <strong>in</strong> each cell j under model i. The lower the value of G 2 the better is<br />

the fit of the model. In the log-l<strong>in</strong>ear estimation procedure after model fitt<strong>in</strong>g follows<br />

model selection, i.e. to identify the models that are clearly wrong and select from a<br />

number of acceptable models the most appropriate. For model selection, apart from<br />

previous knowledge and expectations about dependencies between registers and<br />

heterogeneity of the population, formal procedures based upon likelihood-ratio tests,<br />

known as <strong>in</strong>formation criteria, can be used. One of these procedures is Akaike’s<br />

Information Criterion (AIC) 24 which can be expressed as<br />

AIC = G 2 – 2 [df] (2.6)<br />

The first term, G 2 , is a measure of how well the model fits the data and the second term,<br />

2 [df], is a penalty for the addition of parameters (and hence model complexity). Another<br />

<strong>in</strong>formation criterion is the Bayesian Information Criterion (BIC) 25 which can be<br />

expressed as<br />

BIC = G 2 – [ln Nobs] [df] (2.7)<br />

where Nobs is the total number of observed <strong>in</strong>dividuals. Relative to the AIC, the BIC<br />

penalises complex models more heavily. In general, <strong>in</strong> the log-l<strong>in</strong>ear capture-<strong>recapture</strong><br />

estimation procedure the least complex, i.e. the least saturated (<strong>in</strong> other words the most<br />

27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!