Evaluating non-randomised intervention studies - NIHR Health ...

More documents

Recommendations

Info

Empirical evaluation of the ability of case-mix adjustment methodologies to control for selection biasdirectly the likely degree of residual confoundingthat may be present, and therefore we cannotgauge how biased adjusted results may still be. Bycomparing with results based on randomisation,our investigations suggest that the degree ofunderadjustment may be large. Indeed, ourresults may in fact be overoptimistic, as thecovariate data used were recorded in a standardway according to trial protocols, and werecomplete for all participants. In many nonrandomisedstudies measurement methods arenot standardised. Also, covariate data areincomplete (especially in retrospective studies),leading to bias if the observations are not missingat random.Our two greatest concerns are the potentialincrease in bias that could occur as a result of theexistence of correlated misclassification ofcovariates, and the differences betweenconditional and unconditional estimates.Correlated misclassification is a problem inherentto the data, and cannot be adjusted for. It is verydifficult to know the degree of misclassificationand error in a variable, and impossible to knowwhether the variable being used is the ‘true’confounder or just a proxy. These findingsquestion the appropriateness of the strategy ofincluding data on all available potentialconfounders when adjusting for case-mix, whichhas been the starting point of many riskadjustmentmethods used throughout healthcare.However, the same findings could be explained bythe peculiar differences between unconditionaland conditional estimates of treatment effectsobserved when results are expressed as ORs,although this mechanism only applied to estimatesobtained from logistic regression and stratificationmethods.The finding of high levels of residual confoundingand the detrimental effect of adjustment were seenin both historically controlled studies, known to beprone to systematic bias, and in concurrentlycontrolled studies, more prone to unpredictabilityin bias. The relationships were also noted instudies mimicking allocation by indication.It is important to find out whether suchdestructive relationships between covariates arecommon. We have examined data from only twoclinical situations, but in both we observed resultsthat undermine the use of case-mix adjustment.Also in the IST, case-mix adjustment was found tobe detrimental in eight of the 14 regions.There appears to be a small potential benefit ofusing propensity score methods over logisticregression for case-mix adjustment in terms of theconsistency of estimates of treatment effects. Whilelogistic regression always increased the range ofobserved treatment effects, propensity scoremethods did not. This finding may indicate agreater role for propensity score methods inhealthcare research, although in the particularapplications investigated neither approachperformed adequately.For those critically appraising non-randomisedstudies, the recommendation to assess whether“investigators demonstrate similarity in all knowndeterminants of outcome” 138,139 has not beenuniversally supported by our empiricalinvestigations. The second recommendation, toassess whether they “adjust for these differencesin analysis” is also not supported empirically.Our analyses suggest that there are considerablecomplexities in assessing whether a casemixadjustment analysis will increase ordecrease bias.These findings may have a major impact on thecertainty which we assign to many effects inhealthcare which have been made on the basis ofusing risk adjustment methods.86
Health Technology Assessment 2003; Vol. 7: No. 27Chapter 8Discussion and conclusionsChapters 3–7 have reported results from fiveseparate evaluations concerning non-randomisedstudies. The results have been discussed in detailin each chapter. We summarise their main findingsbelow.Summary of key findingsOur review of previous empirical investigations ofthe importance of randomisation (Chapter 3)identified eight studies that fulfilled our inclusioncriteria. Each investigation reported multiplecomparisons of results of randomised and nonrandomisedstudies. Although there was overlap inthe comparisons included in these reviews, theyreached different conclusions concerning the likelyvalidity of non-randomised data, mainly reflectingweaknesses in the meta-epidemiologicalmethodology that they all used, most notably thatit was not able to account for confounding factorsin the comparisons between randomised and nonrandomisedstudies, nor to detect anything otherthan systematic bias.We identified 194 tools that could be used toassess the quality of non-randomised studies(Chapter 4). Overall the tools were poorlydeveloped: the majority did not provide a meansof assessing the internal validity of nonrandomisedstudies and almost no attention waspaid to the principles of scale development andevaluation. However, 14 tools were identified thatincluded items related to each of our pre-specifiedcore internal validity criteria, which related toassessment of allocation method, attempts toachieve comparability by design, identification ofimportant prognostic factors and adjustment ofdifferences in case-mix. Six of the 14 tools wereconsidered potentially suitable for use as qualityassessment tools in systematic reviews, but allrequire some modification to meet all of our prespecifiedcriteria.Of 511 systematic reviews we identified thatincluded non-randomised studies, only 169 (33%)assessed study quality, and only 46% of thesereported the results of the quality assessment foreach study (Chapter 5). This is lower than the rateof quality assessment in systematic reviews ofrandomised controlled trials. 131 Among those thatdid assess study quality, a wide variety of qualityassessment tools were used, some of which weredesigned only for use in evaluating RCTs, andmany were designed by the review authorsthemselves. Most reviews (88%) did not assess keyquality criteria of particular importance for theassessment of non-randomised studies. Sixty-ninereviews (41%) investigated the impact of quality onstudy results in a quantitative manner. The resultsof these analyses showed no consistent pattern inthe way that study quality relates to treatmenteffects, and were confounded by the inclusion of avariety of study designs and studies of variablequality.A unique ‘resampling’ method was used togenerate multiple unconfounded comparisonsbetween RCTs and historically controlled andconcurrently controlled studies (Chapter 6). Theseempirical investigations identified twocharacteristics of the bias introduced by using nonrandomallocation. First, the use of historicalcontrols can lead to systematic over- orunderestimations of treatment effects, thedirection of the bias depending on time trends inthe case-mix of participants recruited to the study.In the studies used for the analyses, these timetrends varied between study regions, and weretherefore difficult to predict. Second, the results ofboth study designs varied beyond what wasexpected from chance. In a very large sample ofstudies the biases causing the increasedunpredictability on average cancelled each otherout, but in individual studies the bias could befairly large, and could act in either direction.These biases again relate to differences in casemix,but the differences are neither systematic norpredictable.Four commonly used methods of dealing withvariations in case-mix were identified: (i) discardingcomparisons between groups which differ in theirbaseline characteristics, (ii) regression modelling,(iii) propensity score methods and (iv) stratifiedanalyses (Chapter 7). The methods were applied tothe historically and concurrently controlled studiesgenerated in Chapter 6, and also to studiesdesigned to mimic ‘allocation by indication’. Noneof the methods successfully removed bias in87© Queen’s Printer and Controller of HMSO 2003. All rights reserved.
Page 1 and 2:
Health Technology Assessment 2003;
Page 3 and 4:
Evaluating non-randomisedinterventi
Page 5 and 6:
Page 7:
Page 11 and 12:
Page 13 and 14:
Page 15 and 16:
Page 17:
Page 21 and 22:
Page 23 and 24:
© Queen’s Printer and Controller
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33:
Page 36 and 37:
Evaluation of checklists and scales
Page 38 and 39:
Page 40 and 41:
Page 42 and 43:
Page 44 and 45:
32TABLE 8 Details of top 60 quality
Page 46 and 47:
Page 48 and 49: Evaluation of checklists and scales
Page 50 and 51: 38TABLE 10 Other domains: reporting
Page 56 and 57: Use of quality assessment in system
Page 62 and 63: Empirical estimates of bias associa
Page 76 and 77: Empirical evaluation of the ability
Page 82 and 83: 70TABLE 22 Comparison of concurrent
Page 86 and 87: 74TABLE 26 Comparison of methods of
Page 92 and 93: 80TABLE 33 Hypothetical example dem
Page 94 and 95: 82TABLE 34 Hypothetical example dem
Page 100 and 101: Discussion and conclusions88histori
Page 102 and 103: Discussion and conclusions90For exa
Page 104 and 105: Discussion and conclusionsNon-rando
Page 107 and 108: Health Technology Assessment 2003;
Page 121: Health Technology Assessment 2003;
Page 124 and 125: Appendix 1data)) or (non-random$ or
Page 126 and 127: Appendix 2AuthorYearENDARESourcePub
Page 128 and 129: Appendix 2Author:Accession No:Endno
Page 130 and 131: Appendix 20 0 00Additional outcomes
Page 132 and 133: Appendix 2Endnote NoWas CMA conside
Page 134 and 135: 122AuthorOrigin aModified toolTool
Page 148 and 149:
Appendix 4136DuRant, 1994 99The typ
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185:
Page 188 and 189:
Health Technology Assessment Progra
Page 190:
Health Technology Assessment Progra
show all

Evaluating non-randomised intervention studies - NIHR Health ...

Create successful ePaper yourself

Delete template?

Save as template?