Evaluating non-randomised intervention studies - NIHR Health ...

More documents

Recommendations

Info

Evaluation of checklists and scales for assessing quality of non-randomised studies40star for each item within the selection andoutcome categories and a maximum of two starsfor comparability.Items relating to the appropriateness of theanalysis would make the tool more comprehensivein coverage. Further modifications would benecessary to adapt the tool for use in reviews ofeffectiveness of interventions rather than reviewsof causation. For example, an item on method ofallocation of participants to groups would beuseful. (Such modifications to the tool have in factbeen made for use in a systematic review ofarterial revascularisation. 113 ) No information wasprovided on the reliability or validity of the tool.Our reviewers found it relatively easy to use,taking only 5–10 minutes to complete. With theaforementioned caveats, the Newcastle–Ottawatool was judged to be suitable for use in asystematic review.ReischReisch and colleagues 111 developed their tool tofacilitate the evaluation of the design andperformance of therapeutic studies and to teachcritical appraisal skills. It is currently used by theCochrane Inflammatory Bowel Disease Group.The tool was based on ‘accepted researchstandards’ and lists a total of 57 items groupedinto 12 categories including study design, samplesize, randomisation and comparison groups. Thetool aims to evaluate any study design and is verycomprehensive in its coverage of internal validity.It is organised such that a systematic approach toanswering each question is ensured. Responseoptions include yes, no, unclear or unknown, andnot applicable.The criteria considered most important weredesignated as primary criteria (a total of 34 items).The selection of some of the items, however, maybe questionable regarding their importance forinternal validity. For example, statement of thepurpose of the study is more a reporting issuethan one of validity. On the other hand, otheritems selected are very relevant, including both‘randomisation claimed and documented’ and ‘useof either prognostic stratification prior to studyentry or retrospective stratification during dataanalyses’. The tool is also useful for reviewsincluding non-randomised studies in that it askswhether a randomised design would have beenpossible. However, some of the criteria are rathertoo specific to pharmaceutical studies and wouldrequire modification for more general use. Theinter-rater reliability of the tool was found to behigh when assessed in a later study, 114 especiallywhen only the primary criteria were considered(Pearson r = 0.99). However, we found the tool tobe rather long, taking approximately 25 minutesto complete. Overall the Reisch tool was judgedsuitable for use in a systematic review, although ashorter version would be more useable.ThomasThe tool produced by Thomas 65 was developed toassess the methodological quality of studies. Itaims to cover any study design and includes 21items separated into eight components: selectionbias, study design, confounders, blinding, datacollection methods, withdrawals and dropouts,intervention integrity and analysis. The method ofgeneration of the items included was not reported.A list of response options for each item isprovided. Following completion of the tool,reviewers are asked to give an overall rating of thestudy (strong/moderate/weak) for each of the firstsix components in order to provide a globalquality rating.This tool is able to deal with both randomised andnon-randomised studies, including items on bothrandomisation and control for confounders,although methods of allocation other thanrandomisation were not considered. The validityand reliability of the tool were not reported. Wefound it easy to use, taking 10–15 minutes tocomplete, with a comprehensive guide forcompletion provided. The Thomas tool wasjudged to be suitable for use in a systematicreview.ZazaThe tool by Zaza and colleagues 86 was designedfor use as part of a data collection instrument forsystematic reviews to inform the US ‘Guide toCommunity Preventive Services: SystematicReviews and Evidence-based Methods’. Items weregenerated from a review of methodologicalliterature, and expert opinion was solicited onpilot versions. The tool aims to cover any studydesign, with 22 quality items, grouped into sixcategories: descriptions, sampling, measurement,data analysis, interpretation of results and other. Itwas developed as part of a much largerstandardised data abstraction form and detailedinstructions for completion and cross-referencingto other parts of the data extraction form areprovided.The items included are not specific to anyparticular study design but are phrased in such away that they can be applied to any study design.The authors provide explicit decision rules and
Health Technology Assessment 2003; Vol. 7: No. 27explanation of responses to aid completion. Theconcept behind the tool seems a good one, but itis difficult to complete in isolation from the rest ofthe data extraction form. Despite the instructionsfor completion, some of the items may be toogeneric and difficult to interpret. For example,allocation of participants to groups is consideredin the ‘interpretation of results’ section under theitem ‘considering the study design, wereappropriate methods for controlling confoundingvariables and limiting potential biases used’. Thisitem aimed to cover randomisation, restriction,matching, etc. The complexity of the tool meantthat it took up to 30 minutes to complete. It maybe that the ease of use would increase withpractice. The Zaza tool may be suitable for use insystematic reviews but does require a goodunderstanding of validity issues.DiscussionA total of 213 potential quality assessment toolswere identified for inclusion in this review. Overallthe tools were poorly developed, with almost noattention to standard principles of scaledevelopment. 57 Almost without exception, theitems included in the tools were based on so-called‘standard criteria’ gleaned from methodologicalliterature, clinical trial or epidemiology textbooksor a review of other quality assessment tools. Mosttools did not provide a means of assessing theinternal validity of non-randomised studies, andseveral were aimed specifically at only randomisedtrials. Only 60 (30%) included items related to fiveout of six internal validity domains. Of these, 14were sufficiently comprehensive in their contentcoverage to be considered in detail. In order to beselected, these tools had to include at least threeof four pre-specified core items:●●●●how allocation occurredany attempt to balance groups by designidentification of prognostic factorscase-mix adjustment.These were selected as core items because for nonrandomisedstudies it is important to know, first,how study participants got into the interventiongroups, that is, was it by clinician or patientpreference, or was it according to spatial ortemporal factors and, second, what specific factorsinfluenced the selection of patients into eachgroup. Given that study investigators rarely reportthe latter information, the identification ofprognostic factors that influence outcome (asopposed to allocation) was included as a core item.However, covering these items does not necessarilymake the tools useful. For example, asking for adescription of the method of allocation 109,111 doesnot force a judgement about whether that methodwas appropriate or unlikely to introduce bias.Other than their content coverage, the top 14tools did not stand out from the remaining toolsin terms of their development, which was oftenvaguely reported, or the investigation of validityor reliability.A relatively informal assessment of the usefulnessof the top 14 tools identified six that werepotentially useful for systematicreviews. 65,66,85,86,109,111 The main advantage ofthese tools over the remaining eight was thephrasing of the items. On the whole, these forcethe reviewer to be systematic in their studyassessments and attempt to ensure that qualityjudgements are made in the most objectivemanner possible. Four of the tools attempted tocover most study designs using the samequestions, which could be the most usefulapproach for a systematic review whichincorporates several different study designs. Thisapproach did appear reasonably successful,although it may often be more appropriate tothink of quality issues on top of a study design‘hierarchy’. For example, there seems to be littlepoint in being overly concerned with blindinglevels when comparing an RCT with a before-andafterstudy. Furthermore, in many cases a fullassessment of study quality may require additionalcontext-specific questions to cover aspects ofexternal validity that would not be included in ageneric quality assessment tool, for example, itemsrelating to the quality of delivery of anintervention or quality of outcome measurements.Many of the tools, including some in the top14, 85,111 contained several items unrelated tomethodological quality. Although it is important todistinguish the quality of a study from the qualityof its report, one of the identified problems withscales for assessing RCTs is the inclusion of itemsrelating to issues such as reporting quality. 59Those tools which followed the lines of Cooper’s‘mixed-criteria’ approach, 45 requiring objectivefacts regarding a study’s design followed by aquality judgement (e.g. Thomas 65 ), may prove tobe the most useful for systematic reviews. Suchtools make as explicit as possible the factsregarding a study’s methods that should underliea reviewer’s judgement. Some tools were foundthat ignore judgements entirely and others41© Queen’s Printer and Controller of HMSO 2003. All rights reserved.
Page 1 and 2: Health Technology Assessment 2003;
Page 3 and 4: Evaluating non-randomisedinterventi
Page 7: Health Technology Assessment 2003;
Page 23 and 24: © Queen’s Printer and Controller
Page 27 and 28: © Queen’s Printer and Controller
Page 36 and 37: Evaluation of checklists and scales
Page 44 and 45: 32TABLE 8 Details of top 60 quality
Page 50 and 51: 38TABLE 10 Other domains: reporting
Page 56 and 57: Use of quality assessment in system
Page 62 and 63: Empirical estimates of bias associa
Page 76 and 77: Empirical evaluation of the ability
Page 82 and 83: 70TABLE 22 Comparison of concurrent
Page 86 and 87: 74TABLE 26 Comparison of methods of
Page 92 and 93: 80TABLE 33 Hypothetical example dem
Page 94 and 95: 82TABLE 34 Hypothetical example dem
Page 100 and 101: Discussion and conclusions88histori
Page 102 and 103:
Discussion and conclusions90For exa
Page 104 and 105:
Discussion and conclusionsNon-rando
Page 107 and 108:
Health Technology Assessment 2003;
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121:
Page 124 and 125:
Appendix 1data)) or (non-random$ or
Page 126 and 127:
Appendix 2AuthorYearENDARESourcePub
Page 128 and 129:
Appendix 2Author:Accession No:Endno
Page 130 and 131:
Appendix 20 0 00Additional outcomes
Page 132 and 133:
Appendix 2Endnote NoWas CMA conside
Page 134 and 135:
122AuthorOrigin aModified toolTool
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Appendix 4136DuRant, 1994 99The typ
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
© Queen’s Printer and Controller
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185:
Page 188 and 189:
Health Technology Assessment Progra
Page 190:
Health Technology Assessment Progra
show all

Evaluating non-randomised intervention studies - NIHR Health ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?