Rob van Hest Capture-recapture Methods in Surveillance - RePub ...

More documents

Recommendations

Info

Estimating infectious disease incidence for probable imperfect record-linkage and possible remaining false-positive records in the hospital register. The parsimonious log-linear model of dataset 13b fitted the adjusted data well and gave an estimate of 1547 (95%CI 1513-1600) tuberculosis patients and corresponding truncated model estimates. The initial truncated model estimates came relatively close to the final log-linear model estimate. The equiprobability and number of data sources assumptions The truncated binomial model assumes that all sources have the same probability of capturing a case. In addition the truncated Poisson model assumes an infinite number of sources, although in our data the number of sources was limited to three. On this argument the truncated binomial model for three data sources is a more realistic alternative estimator. However, any departure from equiprobability results in an estimation error, which analytically is overestimation (see Appendix). Realistic estimates of this error can be obtained from the data. In Table 3 the last column shows the coefficients of variation, a measure of variability in the coverages of the three data sources for each study. This is calculated as the standard deviation divided by the mean from the three quantities N1 (number of cases known on source 1), N2 (number of cases known on source 2) and N3 (number of cases known on source 3). We demonstrate the possible effect of violation of the equiprobability assumption by studies 4 and 11. For study 4, which has a high coefficient of variation (0.86), if the sources were truly independent, the number of unobserved cases would be 702, calculated by fitting the log-linear model with main effects only. Our truncated binomial estimator gives 1325 cases, nearly twice as large. For study 11, with a low coefficient of variation (0.06), independence implies that there are 155 unobserved cases, while the truncated binomial estimate is 212, an overestimation by about 30%. Studies 3 and 4 indicate that the high f1/f2 ratios result from violation of the equiprobability assumption, producing overestimates by the truncated models. Two-source validation Any three-source study can be used to test two-source estimation by treating one source as though it were a complete list of cases and extract a complete 2 x 2 table. We demonstrate this for two studies, numbers 4 and 11, which we chose above for their coefficients of variation and took register 3 as the complete set. Validation was by comparing the Petersen estimator (N10 N01/N11) [1] and the truncated binomial estimator, which for two lists is (f1) 2 /4f2, on the 2 x 2 table with the known “unlisted” number. For study 4 there were 451 “unlisted” cases, i.e. on neither of registers 1 and 2. The Petersen estimator is 37 and the truncated binomial estimator 42. The two estimators are similar because registers 1 and 2 have approximately equal coverage but both are far short of the true figure (Zelterman and Chao models estimates are 79 and 84 respectively). For study 11 there were 161 “unlisted” cases and the two estimators were 57 and 64. Again the estimators agree but are short of the true figure. Now the Zelterman and Chao model estimates are 107 and 130, respectively, and perform slightly better. However, we had some hesitation in extracting 2 x 2 tables from three-source capture-recapture data, more specifically from capture-recapture studies on infectious disease incidence. As explained earlier, (positive) interdependencies between the three conventional registers used for 151
Chapter 10 such studies should be expected. Extracting 2 x 2 tables ignores possible conditional dependence confounding the results thus obtained. For studies 4 and 11 the log-linear models included one respectively two interaction terms for pair-wise dependencies, which may explain the underestimation in the Petersen and truncated estimators. We therefore also validated the two studies with independent log-linear models (studies 1 and 2). We took register 2 as the complete set for study 1 and register 3 as the complete set for study 2. For study 1 there were 73 “unlisted” cases. The Petersen estimator, 43, is a little low, but the truncated binomial estimator, at 201, is too high (Zelterman and Chao models estimates are 397 and 401, respectively). The discrepant (over)estimate by the truncated models can be explained by the different coverages of registers 1 and 3, i.e. violation of the equiprobability assumption. In study 2 the coefficient of variation was low and the coverage of registers 1 and 2 similar. For study 2 there were 22 “unlisted” cases. The Petersen estimator and the truncated binomial estimator are both 25 and similar to the known “unlisted” number, explained by almost absent violation of both the independent sources and equiprobability assumptions. The Zelterman and Chao models estimates are 43 and 51 respectively and the discrepancy with the truncated binomial model estimate can be explained by violation of the “infinite number of sources” assumption. Alternative models As an alternative to log-linear capture-recapture models a structural source model has been proposed. 36 Whereas log-linear models only partly identify and incorporate dependencies between registers, the structural source model models potential interdependencies of the registers and heterogeneity of the population, partly based on prior knowledge, and estimates the probabilities of conditions that produce these interactions between the registers. However, the published data of the capture-recapture studies were insufficient to re-examine these studies with a structural source model. Conclusion We have indicated conditions where estimates of infectious disease incidence from loglinear models are similar or dissimilar to alternative truncated models for incomplete count data. Our results suggest that for estimating infectious disease incidence and completeness of notification independent and parsimonious three-source log-linear capture–recapture models are preferable. When saturated models are selected as best- fitting model and the estimates are unexpectedly high and seem implausible, first, the data should be re-examined with truncated models as a heuristic tool, in the absence of a gold standard, to identify possible failure in the saturated log-linear model when the truncated models produce a lower estimated number of infectious disease patients. Second, in case of such discrepancy between the log-linear and the truncated model estimates, the data should be re-examined for possible violation of the underlying capture-recapture assumptions, such as imperfect record-linkage, false-positive records or heterogeneity, corrected and the capture-recapture analysis repeated on the corrected data. When after repeated capture-recapture analysis the discrepancy between the log-linear and the truncated model estimates remains or no violation of the underlying assumptions can be identified, the investigator should be cautious about using the associated outcome. 10 Using truncated model estimates as an early alert could prevent flawed capture-recapture 152
Page 1 and 2:
Capture-recapture Methods in Survei
Page 3 and 4:
Colofon Capture-recapture methods i
Page 5 and 6:
Promotiecommissie Promotor: Prof.dr
Page 8:
Contents Page 1 Introduction 9 2 Me
Page 11 and 12:
Chapter 1 1.1 Assessing completenes
Page 13 and 14:
Chapter 1 94% among AIDS patients w
Page 15 and 16:
Table 1.1 Objectives, methods, data
Page 17 and 18:
Researchers Objective Method Data-s
Page 19 and 20:
Chapter 1 source capture-recapture
Page 21 and 22:
Chapter 1 32. Bradley BL, Kerr KM,
Page 24 and 25:
2 Methodology of capture-recapture
Page 26 and 27:
Methodology of capture-recapture an
Page 28 and 29:
Page 30 and 31:
Page 32 and 33:
Page 34 and 35:
Page 36:
Page 39 and 40:
Chapter 3 3.1 Application of captur
Page 41 and 42:
Table 3.1 Published capture-recaptu
Page 43 and 44:
Disease Authors Objective Method Da
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Chapter 3 3.3 References 1. Hook EB
Page 51 and 52:
Chapter 3 52. Reintjes R, Termorshu
Page 53 and 54:
Chapter 4 Abstract The aim of this
Page 55 and 56:
Chapter 4 Methods Nearly all Dutch
Page 57 and 58:
Chapter 4 Table 4.2 The number of m
Page 59 and 60:
Chapter 4 diagnosis. Other patients
Page 61 and 62:
Chapter 4 stratify the population i
Page 63 and 64:
Chapter 4 25. Lambeth, Southwark an
Page 65 and 66:
Chapter 5 Abstract To estimate inci
Page 67 and 68:
Chapter 5 patients. Capture-recaptu
Page 69 and 70:
Chapter 5 Figure 5.1 Four regions o
Page 71 and 72:
Chapter 5 Hospital records From 385
Page 73 and 74:
Chapter 5 Table 5.1 Epidemiological
Page 75 and 76:
Table 5.2 Number and proportion of
Page 77 and 78:
Chapter 5 cautious about the associ
Page 79 and 80:
Chapter 5 Incidence of community-ac
Page 81 and 82:
Chapter 6 Abstract The aim of this
Page 83 and 84:
Chapter 6 3. Hospitalised patients
Page 85 and 86:
Chapter 6 Record-linkage of all 537
Page 87 and 88:
Chapter 6 Figure 6.1 Schematic view
Page 89 and 90:
Chapter 6 Table 6.2 Total and strat
Page 91 and 92:
Chapter 6 administrative discrepanc
Page 93 and 94:
Chapter 6 the patients, and complet
Page 96 and 97:
7 Undetected burden of tuberculosis
Page 98 and 99:
Introduction Underreporting of tube
Page 100 and 101:
Underreporting of tuberculosis in t
Page 102 and 103: Table 7.1 Total and subset numbers
Page 104 and 105: Table 7.2 Capture-recapture estimat
Page 106 and 107: Underreporting of tuberculosis in t
Page 108: Underreporting of tuberculosis in t
Page 111 and 112: Chapter 8 Abstract In 1999 the Enha
Page 113 and 114: Chapter 8 bacteriological confirmat
Page 115 and 116: Chapter 8 Results Table 8.1 shows t
Page 117 and 118: Table 8.3 Annual and overall observ
Page 119 and 120: Table 8.4 Annual and overall estima
Page 121 and 122: Chapter 8 interaction however, i.e.
Page 123 and 124: Chapter 8 shows that observed under
Page 126 and 127: 9 Estimating the coverage of tuberc
Page 128 and 129: Introduction Estimating targeted mo
Page 130 and 131: Estimating targeted mobile tubercul
Page 136 and 137: References Estimating targeted mobi
Page 138 and 139: 10 Estimating infectious diseases i
Page 140 and 141: Introduction Estimating infectious
Page 142 and 143: Estimating infectious disease incid
Page 144 and 145: Study Disease and number of patient
Page 146 and 147: 4. Gallay A, Vaillant V, Bouvet P,
Page 148 and 149: Table 10.3 Comparison of the variou
Page 150 and 151: Discussion Estimating infectious di
Page 154 and 155: Estimating infectious disease incid
Page 156: Estimating infectious disease incid
Page 159 and 160: Chapter 11 The aim of this thesis i
Page 161 and 162: Chapter 11 overestimation of the nu
Page 163 and 164: Chapter 11 clinicians to the Public
Page 165 and 166: Chapter 11 operation with the chest
Page 167 and 168: Chapter 11 of the proportion of fal
Page 169 and 170: Chapter 11 Question 3: What is the
Page 171 and 172: Chapter 11 11.2 Some findings of th
Page 173 and 174: Chapter 11 • To improve timelines
Page 176 and 177: Summary Summary Surveillance is an
Page 178 and 179: Summary linkage of notification and
Page 180 and 181: Samenvatting Samenvatting Surveilla
Page 182 and 183: Samenvatting subgroepen binnen de b
Page 184 and 185: Acknowledgements Acknowledgements W
Page 186 and 187: Curriculum vitae Rob van Hest was b
Page 188 and 189: Publications This thesis Van Hest N
show all

Rob van Hest Capture-recapture Methods in Surveillance - RePub ...

Create successful ePaper yourself

Delete template?

Save as template?