13.07.2015 Views

Evaluating non-randomised intervention studies - NIHR Health ...

Evaluating non-randomised intervention studies - NIHR Health ...

Evaluating non-randomised intervention studies - NIHR Health ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Health</strong> Technology Assessment 2003; Vol. 7: No. 27as the exponential of the mean of the log OR] andspread [the standard deviation of the observed logOR]. As in Chapter 6, ratios of the the averageORs of the RCTs and the average ORs of the <strong>non</strong><strong>randomised</strong><strong>studies</strong> (adjusted or unadjusted)quantify systematic bias, while ratios of their SDsindicate unpredictability in the bias. The likelyconclusions of the analyses were investigated byconsidering the percentage of <strong>studies</strong> for eachanalysis which reported statistically significant (p< 0.05) results, separately in the direction of harmand of benefit.ResultsExclusion of <strong>non</strong>-matching groupsTable 22 presents the results for analyses of the ISTaccording to the number of covariates thatdiffered significantly between treatment andcontrol groups. Table 23 presents comparableanalyses for the ECST.For the IST, 23% of the historically controlled<strong>studies</strong> appeared to have comparable groups inthat they had no statistically significant differencesin baseline covariates (Table 22). There did appearto be a reduction in both systematic andunpredicatable dimensions of bias as the numberof differing covariates reduced. However, among<strong>studies</strong> with no significantly unbalanced covariatesthe results were still more variable than those fromthe corresponding RCTs, and there was still anexcess of statistically significant results.Table 23 shows that 18% of the historicallycontrolled <strong>non</strong>-<strong>randomised</strong> <strong>studies</strong> generatedfrom the ECST had no statistically significantdifferences in baseline covariates. A reverse trendwas noted with study results becoming morebiased with fewer significant differences inbaseline, contrary to the hypothesis thatcomparability at baseline predicts reliable results.Only 1% of concurrently controlled <strong>studies</strong> fromthe IST had no significant differences at baseline(Table 22), and reductions in excess variability wereless marked than for historically controlled <strong>studies</strong>.Further, the absence of statistically significantdifferences did not guarantee comparability, withspuriously statistically significant treatment effectsstill being common when treated and controlgroups appeared to match.As shown in Table 23, 12% of the concurrentlycontrolled <strong>non</strong>-<strong>randomised</strong> <strong>studies</strong> generatedfrom the ECST had no statistically significantdifferences in baseline covariates, but againsystematic bias was actually higher in thissubgroup of apparently comparable <strong>studies</strong> thanin the <strong>studies</strong> which had significant differences inat least one baseline covariates.Adjustment for ‘naturally’ occurringbiasesSystematic bias originating from historicallycontrolled <strong>studies</strong>The clearest example of systematic bias wasobserved in historically controlled <strong>studies</strong> in theECST data, where the average OR estimate was1.06 compared with 1.23 in the RCTs, grosslyunderestimating the harmfulness of treatment(see Chapter 6). The seven case-mix adjustmentmethods were applied to each of the historicallycontrolled <strong>studies</strong> to investigate the degree towhich the case-mix methods could adjust for thisbias. The results are presented in Table 24 andFigure 15. The adjusted results from six of theseven methods were on average more biased thanthe unadjusted results: only the full logisticregression model appeared to reduce bias. Allmethods also increased variability(unpredictability in bias), the greatest increasebeing with the full logistic regression. Thisincrease in the width of distributions of results isdiscernible in Figure 15.The use of adjustment methods inflates thestandard error of the estimate of a treatmenteffect. The use of propensity score matching led inaddition to a reduction in sample size (as 45% ofparticipants were discarded), further reducingpower. Thus, although many of the adjustedestimates were on average more biased than theunadjusted estimates, only logistic regressionmethods had markedly increased spurioussignificance rates.Systematic bias was also observed in thehistorically controlled <strong>studies</strong> generated from theIST, although it differed between regions. Overalla small systematic bias was noted in theaggregated results (Table 25). Logistic regressionmodelling showed a similar pattern of behaviouras for the ECST comparison, increasing bothaverage bias and the variability of results.Propensity score methods slightly overadjusted forthe bias, but gave results closest to the RCTresults. Spurious statistical significance ratesdecreased slightly with logistic regression and werecompletely removed by propensity score methods.The case-mix adjusted results for the historicallycontrolled <strong>studies</strong> from the IST analysis are71© Queen’s Printer and Controller of HMSO 2003. All rights reserved.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!