13.07.2015 Views

Evaluating non-randomised intervention studies - NIHR Health ...

Evaluating non-randomised intervention studies - NIHR Health ...

Evaluating non-randomised intervention studies - NIHR Health ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Evaluation of checklists and scales for assessing quality of <strong>non</strong>-<strong>randomised</strong> <strong>studies</strong>36In terms of the four pre-specified core items, 15 ofthe 60 tools included <strong>non</strong>e of the core itemsdespite covering at least five domains, 16 coveredone core item and 15 covered two items. Theremaining 14 tools covered at least three coreitems and were considered to be the ‘best’ tools inour sample, two of which covered all fouritems. 104,105 It is interesting that of the six toolsthat included items in all six internal validitydomains, 72,77,83,85,97,106 only one 85 included threeof our four core items. None of the tools designedonly for RCTs included three of the four coreitems.‘Best’ toolsFourteen tools were identified which covered atleast five of the six internal validity domains andthree of the four core items. Tables 9 and 10itemise the pre-specified items covered by eachtool.Amongst the top 14 tools, the internal validitydomain with the poorest coverage was analysis(four tools with zero items – CASP, 64 Fowkes, 107Newcastle–Ottawa 66 and Weintraub 108 ), followedby blinding (missed by Bracken 104 and Zaza 86 ) andascertainment (missed by Cowley 109 andHadorn 102 ). The item most commonly missed wasequal follow-up between groups (included by onlytwo tools – Bracken 104 and Downs 85 ). Only threetools asked about use of intention-to-treat analysis(Cowley, 109 Thomas 65 and Vickers 110 ).The two core domains were reasonably wellcovered. For the creation of groups domain, all ofthe tools except those specifically designed onlyfor observational <strong>studies</strong> (Bracken, 104 CASP 64 andNewcastle–Ottawa 66 ) included an item onrandomisation, but only two tools specificallyconsidered the use of allocation concealment(Downs 85 and DuRant 99 ). Of the four core items,the most commonly missed item was that relatingto how allocation occurred. Only eight toolsincluded this item. Ideally, we were looking for anitem that asked about how participants got intotheir respective groups, for example, was it byclinician or patient preference or was it spatial ortemporal assignment. All of the tools except forDowns, 85 DuRant 99 and Hadorn 102 included thesecond pre-specified item – balancing of groups bydesign.For the comparability of groups domain the twopre-specified items – identification of prognosticfactors and use of case-mix adjustment – weremissed by only two tools 109,111 and one tool, 108respectively. All of the tools except the CASP tool 64and the Newcastle–Ottawa tool 66 asked if baselinecomparability had been assessed.Our pre-specified items in the remaining sixdomains (Table 10) were, on the whole, not wellcovered, except perhaps for that relating to theselection of the study sample. Every tool includedan item about the representativeness of thesample, and only four did not ask about the studyinclusion/exclusion criteria (CASP, 64 Cowley, 109Newcastle–Ottawa 66 and Thomas 65 ). One of theitems in this domain that is related to internalvalidity – retrospective or prospective selection ofthe sample – was included by only two tools,Cowley 109 and Reisch. 111The remaining five domains concerning thequality of study reporting were not well covered bythe tools. The most commonly included item wasone that considered clear specification of the<strong>intervention</strong>s (nine tools). On the other hand,clear specification of the outcomes was included i<strong>non</strong>ly five tools.Qualitative assessment of the ‘best’toolsOf the best 14 tools, eight were judged to beunsuitable for use in a systematicreview. 64,99,102,104,105,107,108,110 A description ofwhich of the core criteria they covered and ourassessment of them is provided in Appendix 4.In summary, their unsuitability was largely relatedto the fact that they were not designed for use in asystematic review of effectiveness: one waspublished to guide the reporting of observational<strong>studies</strong>; 104 five were intended to help in the criticalappraisal of research articles; 64,99,107,108,110 and onewas developed for an epidemiological review. 105Overall, these tools generally prompted somethinking regarding quality issues, but were notformatted in such a way as to allow an overallassessment of study quality or the comparison ofquality across <strong>studies</strong>. Some 64,108 did conclude witha more general item requiring a judgement on theoverall quality of the study, but little guidance wasprovided as to how this judgement should bemade. The Hadorn tool 102 was intended for use insystematic reviews, but the assessors queried theinclusion of, or phrasing of, several of the items.For example, the emphasis on drug trials and useof placebos was felt to be overly specific.Six quality assessment tools were judged to bepotentially useful for systematicreviews, 65,66,85,86,109,111 although in several casessome modifications would be useful. All but one of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!