Empirical evaluation of the ability of case-mix adjustment methodologies to control for selection biasTABLE 32 Case-mix adjustment with bias caused by multiple covariates, some measured some unmeasured, with unknownmechanism (based on observed outcomes in the IST)Percentage of <strong>studies</strong> withstatistically significantAverageVariability of results results (p < 0.05)OR SD of log OR Ratio with RCT Benefit Harm TotalRCT 0.91 0.33 7 2 9Studies where treatment is related to conditionUnadjusted 0.51 0.34 1.03 60 0 60Stratification 0.51 0.38 1.15 54 0 54Logistic regressionFull model a 0.45 0.50 1.52 50 0 50Stepwise p r = 0.05 b 0.47 0.44 1.33 51 0 51Stepwise p r =0.15 c 0.47 0.46 1.39 51 0 51Propensity scoreMatched d 0.58 0.37 1.12 31 0 31Stratified 0.57 0.34 1.03 39 0 39Regression 0.57 0.33 1.00 39 0 39a Full model includes 10 covariates.b Mean number of covariates included: 4.5.c Mean number of covariates included: 5.8.d Mean number of patients matched: 137 out of 200.78All methods adjusted the crude estimate of thetreatment effect (OR = 0.64) in the direction ofthe result of the RCTs (OR = 0.91), thus removingsome of the selection bias. However, stratificationand logistic regression (LR) removed only a smallfraction of the bias (OR for LR full model = 0.71,OR for stratification = 0.70), propensity score (PS)methods did somewhat better (OR for a matchedPS of 0.79), but remained substantially biased andhence gave far too many statistically significantresults. While the selection mechanism was notdesigned to introduce an unpredictable bias, theresults from the logistic regression model weremuch more variable than the unadjustedresults.Adjusting for bias due to unknown multiplecovariatesSelection according to outcome, as anticipated,introduced strong biases into the data. For theIST, the <strong>non</strong>-<strong>randomised</strong> unadjusted results(OR = 0.51) significantly overestimated treatmentefficacy compared with the RCT results(OR = 0.91) (Table 32).Stratification failed to adjust for the bias at all,with the results being identical with theunadjusted results. Significance rates decreasedslightly.Adjustment using logistic regression increasedbias. For the IST the unadjusted average OR of0.51 decreased to 0.45 (full model). Variability ofresults also increased, the distribution of adjustedresults being 1.48 times the variability ofunadjusted results for the IST. Significance ratesdecreased slightly.PS methods slightly reduced bias in the IST. Thevariability of results increased, but not as much asfor logistic regression results, whilst significancerates decreased.Discussion“The first experience with multivariate analysis is aptto leave the impression that a miracle in thetechnology of data analysis has been revealed; themethod permits control for confounding andevaluation of interactions for a host of variables withgreat statistical efficiency. Even better, a computerdoes all the arithmetic and neatly prints out theresults. The heady experience of commanding acomputer to accomplish all these analytic goals andthe simply gathering and publishing the sophisticated‘output’ with barely a pause for retyping is undeniablyalluring. However useful it may be, multivariateanalysis is not a panacea. The extent to which thisprocess represents improved efficiency rather thanjust bias depends on the adequacy of the assumptionsbuilt into the mathematical model.”From Rothman 149The results of our investigations can besummarised by the following four key results, all of
<strong>Health</strong> Technology Assessment 2003; Vol. 7: No. 27which raise concerns about the performance ofcase-mix adjustment methods:1. Comparisons between <strong>non</strong>-<strong>randomised</strong> groupsthat appear comparable in terms of case-mixare often biased, sometimes more than for <strong>non</strong><strong>randomised</strong>groups that do not appearcomparable.2. Case-mix adjustment methods rarelyadequately adjust for differences in case-mix.3. Logistic regression always increases variabilityin study results.4. All adjustment methods can on occasionincrease systematic bias.The first and second observations are notsurprising, and have been discussed before. 150 Thethird observation has been demonstratedtheoretically always to be the case, 151 although wesuspect that this result is not well known. Thefourth observation is contrary to most beliefsabout case-mix adjustment methods, and has onlybecome detectable through the unique resamplingdesign of our investigation.As Rothman described above, statistical riskadjustment methods are alluring, appearing toprovide a simple solution to many of theinadequacies of the design and execution of <strong>non</strong><strong>randomised</strong><strong>studies</strong>. This message has been widelydisseminated throughout the medical researchcommunity, leading to their routine use inepidemiology and health services research. 136However, the validity of a risk-adjustment modeldepends on fulfilling a demanding set ofassumptions. Below we consider the assumptionsthat may be most critical.Why adjustment methods might notwork?Omitted covariatesRisk adjustment models can only adjust fordifferences in variables that have been observedand measured. As seen with the ‘allocation byindication’ mechanisms based on neurologicaldeficits (IST) (Table 30), if all variables linked tothe allocation mechanism have been measuredand observed, then adequate adjustment may bemade. However, in most situations we do not knowthe variables upon which allocation is based.There may be important prognostic factors thatthe investigators do not know about or have notmeasured which are unbalanced between groupsand responsible for differences in outcome. Forexample, when allocation may be influenced bylevel of consciousness (IST) (Table 31) but theconsciousness variable was not included in themodel, inadequate adjustment for bias was made.If the missing covariates affecting allocation arecorrelated with the observed covariates, somedegree of adjustment is likely to be observed (aswas seen for degree of consciousness), but it isunlikely to be adequate unless the correlations arevery strong. Many texts refer to the unadjustedeffect as ‘residual confounding’ or ‘hidden bias’.Rosenbaum proposed a strategy for investigatingthe robustness of an observational finding ofhidden bias based on sensitivity analyses whichdetermine the size of covariate associationrequired to nullify an observed treatment effect. 152In some situations this approach could help todecide whether hidden bias could fully explain anobserved effect, but such an assessment retains adegree of subjective judgement.Our first and second results could be explained bythe common situation of treatment assignmentdepending on unmeasured covariates. However,missing covariates cannot explain the third andfourth results.Misspecified continuous covariates and omittedinteractionsTwo forms of misspecification can occur in casemixadjustment analyses. The first is when acontinuous variable is categorised, or a categoricalvariable is regrouped into a smaller number ofcategories. Cochran 153 presented analyticalinvestigations that showed that for a continuouscovariate related to outcome with a monotonictrend, dichotomisation would leave 36% of thevariability caused by the relationship unexplained(subject to some distributional assumptions).His results suggest that five categories areneeded to explain successfully at least 90% of thevariability. Brenner and Blettner 154 extendedthis work to consider the efficiencies of differentapproaches of modelling various monotonictrends using multiple categorisations and linearterms.However, Brenner 155 also showed that if thecovariate does not have a monotonic effect, correctcategorisation of the data can be crucial toobtaining a sensible result. Table 33 presentshypothetical data taken from Brenner’s paperwhere there is a U-shaped relationship betweenthe covariate and the outcome. Adjusting for thecovariate classified in three categories (Table 33a)makes an appropriate adjustment, the observedtreatment effect being OR = 1 in all categories.However, if the risk factor is dichotomiseddifferent results are obtained depending on wherethe dichotomisation is made: in Table 33(b) the79© Queen’s Printer and Controller of HMSO 2003. All rights reserved.