24.01.2013 Views

Rob van Hest Capture-recapture Methods in Surveillance - RePub ...

Rob van Hest Capture-recapture Methods in Surveillance - RePub ...

Rob van Hest Capture-recapture Methods in Surveillance - RePub ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Underreport<strong>in</strong>g of tuberculosis <strong>in</strong> England<br />

Notification cases found was used to correct all years under study assum<strong>in</strong>g the annual<br />

proportion is similar.<br />

Previous capture-<strong>recapture</strong> studies on tuberculosis identified a considerable<br />

proportion of rema<strong>in</strong><strong>in</strong>g false-positives among unl<strong>in</strong>ked Hospital cases after exam<strong>in</strong><strong>in</strong>g<br />

<strong>in</strong>dividual patients’ medical files. 17,19 That was not feasible due to the scale of this study.<br />

We estimated the proportion of these rema<strong>in</strong><strong>in</strong>g false-positive cases through a population<br />

mixture model. Briefly, we used 40 covariates (number of admission days, number of<br />

admissions dur<strong>in</strong>g the tuberculosis episode, rank number of tuberculosis diagnosis (14<br />

possible positions) and 37 different ICD-10 tuberculosis diagnosis codes) and the<br />

<strong>in</strong>cidence of Hospital records l<strong>in</strong>ked with Notification and/or Laboratory to estimate the<br />

number of true tuberculosis cases among unl<strong>in</strong>ked records, under the assumption that all<br />

l<strong>in</strong>ked Hospital cases are true tuberculosis cases and unl<strong>in</strong>ked Hospital cases are a mixture<br />

of true and false-positive tuberculosis cases. The best-fitt<strong>in</strong>g logistic regression model<br />

calculates for every Hospital case the predicted Bernoulli parameter p (reflect<strong>in</strong>g the<br />

probability of be<strong>in</strong>g a true tuberculosis patient) from the covariates. L<strong>in</strong>ked and unl<strong>in</strong>ked<br />

Hospital cases have characteristic frequency distributions of values p as “signatures”.<br />

After standardisation we used these signature curves to separate the mixture of unl<strong>in</strong>ked<br />

Hospital cases, assum<strong>in</strong>g the subpopulation of true tuberculosis cases has a similar<br />

signature curve to l<strong>in</strong>ked Hospital cases and the false-positive tuberculosis cases have a<br />

different signature curve (population mixture model available from the authors). The<br />

corrected annual number of true tuberculosis cases only known to Hospital was<br />

calculated us<strong>in</strong>g the formula:<br />

Nf<strong>in</strong>al = (Proptrue x Norig<strong>in</strong>al) x (1 - PropMOTT),<br />

where Norig<strong>in</strong>al and Nf<strong>in</strong>al denote the number of unl<strong>in</strong>ked Hospital cases before and after<br />

deduct<strong>in</strong>g the projected annual proportion of MOTT <strong>in</strong>fection cases and the estimated<br />

annual proportion of rema<strong>in</strong><strong>in</strong>g false-positive tuberculosis cases by logistic regression<br />

respectively, Proptrue the estimated annual proportion of true tuberculosis cases by logistic<br />

regression and PropMOTT the projected annual proportion of MOTT <strong>in</strong>fection cases.<br />

Observed source-specific coverage rates were def<strong>in</strong>ed as the number of<br />

tuberculosis cases <strong>in</strong> each data source divided by the case-ascerta<strong>in</strong>ment, expressed as a<br />

percentage.<br />

<strong>Capture</strong>-<strong>recapture</strong> analysis<br />

The annual and total number of unobserved tuberculosis cases was estimated on the basis<br />

of the f<strong>in</strong>al distribution of observed cases over the three data sources. The <strong>in</strong>dependence<br />

of data sources and other assumptions underly<strong>in</strong>g capture-<strong>recapture</strong> analysis have been<br />

described previously. 21 Interdependencies between the three tuberculosis data sources are<br />

probable, caus<strong>in</strong>g possible bias <strong>in</strong> two-source capture-<strong>recapture</strong> estimates. Three-source<br />

log-l<strong>in</strong>ear capture-<strong>recapture</strong> analysis was employed to take possible <strong>in</strong>terdependencies <strong>in</strong>to<br />

account. 17,19 Estimated source-specific coverage rates were def<strong>in</strong>ed as the number of<br />

tuberculosis cases <strong>in</strong> each data source divided by the estimated number of tuberculosis<br />

cases by capture-<strong>recapture</strong> analysis, expressed as a percentage.<br />

113

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!