Rob van Hest Capture-recapture Methods in Surveillance - RePub ...
Rob van Hest Capture-recapture Methods in Surveillance - RePub ...
Rob van Hest Capture-recapture Methods in Surveillance - RePub ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Methodology of capture-<strong>recapture</strong> analysis<br />
developed for capture-<strong>recapture</strong> analysis by Fienberg. 3 With three registers there are eight<br />
possible comb<strong>in</strong>ations of these registers <strong>in</strong> which cases do or do not appear. The general<br />
model uses eight parameters, the common parameter (the logarithm of the number<br />
expected to be <strong>in</strong> all lists), three ‘ma<strong>in</strong> effects’ parameters (the log odds ratios aga<strong>in</strong>st<br />
appear<strong>in</strong>g <strong>in</strong> each list for cases who appear <strong>in</strong> the others), three ‘two-way <strong>in</strong>teractions’ or<br />
second order effect parameters (the log odds ratios between pairs of lists for cases who<br />
appear <strong>in</strong> the other), and a ‘three-way’ <strong>in</strong>teraction parameter. For three registers, A with i<br />
levels, B with j levels, C with k levels, the natural logarithm (ln or loge) of expected<br />
frequency F ijk for cell ijk, ln F ijk, can be denoted as<br />
A B C AB AC BC ABC<br />
lnFijk<br />
= θ + λi<br />
+ λ j + λk<br />
+ λij<br />
+ λik<br />
+ λ jk + λijk<br />
whereθ is the common parameter , λ A , λ B , and λ C are the ma<strong>in</strong> effect parameters, λ AB ,<br />
λ AC and λ BC are the second order effect (two-way <strong>in</strong>teraction) parameters and λ ABC is the<br />
highest order effect (three-way <strong>in</strong>teraction) parameter. The value of this last three-way<br />
<strong>in</strong>teraction parameter can not be tested from the study data and is assumed to be zero.<br />
Assumptions about the other parameters can be tested, although these tests may not be<br />
very powerful for small samples.<br />
Three types of log-l<strong>in</strong>ear models can be recognised. Firstly, the ‘<strong>in</strong>dependent<br />
model’ which assumes that all registers are <strong>in</strong>dependent. Secondly, models that are<br />
equivalent to two <strong>in</strong>dependent registers or two <strong>in</strong>dependent subsets of registers. F<strong>in</strong>ally, a<br />
‘saturated’ model that <strong>in</strong>corporates all possible <strong>in</strong>teractions, <strong>in</strong>clud<strong>in</strong>g possible three-way<br />
<strong>in</strong>teraction. To assess how the various log-l<strong>in</strong>ear models fit the data (model fitt<strong>in</strong>g) the log<br />
likelihood-ratio test, also known as G 2 or deviance, is used, denoted as<br />
(2.4)<br />
G 2 = -2∑Obs j ln[Obs j /Exp ji] (2.5)<br />
where Obs j is the observed number of <strong>in</strong>dividuals <strong>in</strong> each cell j, and Exp ji is the expected<br />
number of <strong>in</strong>dividuals <strong>in</strong> each cell j under model i. The lower the value of G 2 the better is<br />
the fit of the model. In the log-l<strong>in</strong>ear estimation procedure after model fitt<strong>in</strong>g follows<br />
model selection, i.e. to identify the models that are clearly wrong and select from a<br />
number of acceptable models the most appropriate. For model selection, apart from<br />
previous knowledge and expectations about dependencies between registers and<br />
heterogeneity of the population, formal procedures based upon likelihood-ratio tests,<br />
known as <strong>in</strong>formation criteria, can be used. One of these procedures is Akaike’s<br />
Information Criterion (AIC) 24 which can be expressed as<br />
AIC = G 2 – 2 [df] (2.6)<br />
The first term, G 2 , is a measure of how well the model fits the data and the second term,<br />
2 [df], is a penalty for the addition of parameters (and hence model complexity). Another<br />
<strong>in</strong>formation criterion is the Bayesian Information Criterion (BIC) 25 which can be<br />
expressed as<br />
BIC = G 2 – [ln Nobs] [df] (2.7)<br />
where Nobs is the total number of observed <strong>in</strong>dividuals. Relative to the AIC, the BIC<br />
penalises complex models more heavily. In general, <strong>in</strong> the log-l<strong>in</strong>ear capture-<strong>recapture</strong><br />
estimation procedure the least complex, i.e. the least saturated (<strong>in</strong> other words the most<br />
27