Empirical evaluation of the ability of case-mix adjustment methodologies to control for selection bias64regression models described in (3) model howcovariates relate to outcome, propensity scoremethods model how the same covariates relateto treatment allocation. 145–147 The principle isthat the propensity score summarises themanner in which baseline characteristics areassociated with treatment allocation, so thatselection bias is removed when comparisonsare made between groups with similarpropensity scores. 146 The method involvescalculation for each individual in the datasetthe propensity probability that estimates theirchance of receiving the experimental<strong>intervention</strong> from their baseline characteristics.In an RCT, where there should be norelationship between baseline characteristicsand treatment assignment, this probability willbe the same for each participant (e.g. 0.5 if thegroups are of equal size). In <strong>non</strong>-<strong>randomised</strong><strong>studies</strong>, it is likely that treatment assignmentdoes depend on baseline covariates, and thatpropensity scores will vary betweenindividuals. For example, patients with moresevere disease may be more likely to receiveone treatment than the other. Propensityscores in such a situation would relate todisease severity, and the average propensityscore in the experimental group will differfrom the average in the control group.Estimation of propensity scores is typicallyundertaken using multiple logistic regressionmodels, the outcome variable being treatmentallocation.What evidence is there that these methods actuallyadjust for selection bias in <strong>non</strong>-<strong>randomised</strong><strong>studies</strong>? Although there is plenty of literaturedemonstrating that case-mix adjustment canchange estimates of treatment effects, <strong>non</strong>e of thetexts that we have consulted cite any empiricalevidence demonstrating that case-mix adjustment,on average, reduces bias. Broader searches of themethodological literature related to case-mixlikewise did not identify supporting empiricalevidence.To the contrary, there are hints in the literaturethat case-mix adjustment methods may notadequately perform the task to which they areapplied. Of the eight reviews in Chapter 3, twocompared adjusted and unadjusted estimates oftreatment effects from <strong>non</strong>-<strong>randomised</strong> <strong>studies</strong>with the results of similar RCTs, and noted thatthere was little evidence that adjustmentconsistently moved the estimates of treatmenteffects from <strong>non</strong>-<strong>randomised</strong> <strong>studies</strong> towards thoseof RCTs. 25,27In an attempt to provide empirical evidence ofthe value of case-mix adjustment, we have usedthe same <strong>non</strong>-<strong>randomised</strong> <strong>studies</strong> generated byresampling participants from the IST 132 andECST 133 data sets (described in Chapter 6) toevaluate the performance of eight different casemixadjustment methods in controlling forselection bias. As explained previously,resampling of data from the trials was used togenerate <strong>randomised</strong> and <strong>non</strong>-<strong>randomised</strong><strong>studies</strong> of different designs in such a way that thedifferences between their results with respect tolocation and spread could only be attributed toselection bias. Case-mix adjustment methodswere then applied to each resampled <strong>non</strong><strong>randomised</strong>study making use of the availablebaseline data (different in the two trials – seeTable 20). The distribution of results of theadjusted <strong>non</strong>-<strong>randomised</strong> <strong>studies</strong> was thencompared with both the distribution of unadjustedresults and the distribution of the results of theRCTs, to see whether there was any evidence thatadjustment had reduced or removed the selectionbias.In our investigations we evaluate the ability ofcase-mix adjustment to control for three differenttypes of bias:1. Naturally occurring systematic bias, whereresults are consistently either all overestimatesor all underestimates of the treatment owing tosome naturally occurring unknown allocationmechanism (as was observed in the ECSThistorical comparisons in Chapter 6).2. Naturally occurring unpredictable bias, whereresults are biased variably with differentmagnitudes in different directions leading toboth underestimates and overestimates oftreatment effects (as was observed in the ISTconcurrent comparisons in Chapter 6).3. Bias arising where allocation to treatment isprobabilistically linked to a prognostic variable,such that participants are more likely or lesslikely to receive treatment according to theirobserved characteristics. This in fact is a modelof practice in much of medicine, wheretreatment decisions are made according to theconditions and characteristics that each patientdisplays (allocation by indication). Although wehave no realistic way of mimicking such amechanism for the two trial data sets, wegenerate two artificial scenarios when allocationrelated (a) to the value of a single prognosticcovariate, and, more realistically, (b) to anunknown function of a set of prognosticcovariates.
<strong>Health</strong> Technology Assessment 2003; Vol. 7: No. 27TABLE 20 Baseline covariates used in case-mix adjustment models for the IST and ECSTISTBinary covariatesSex (male/female)Symptoms noted on wakingConsciousnessAtrial fibrillationContinuous covariatesAgeDelay to presentationSystolic blood pressureECSTSex (male/female)Residual neurological signsPrevious MIAnginaCurrent prophylactic aspirin useAgeDegree of stenosisUnordered categorical variablesInfarct visible on CT scanType of strokeOrdered categorical variablesNeurological deficit score (7 categories)Presenting stroke (4 categories)The investigative method provides an opportunityto make comparisons between different case-mixadjustment strategies: matching baseline groups,stratification, regression and propensity scoremethods.MethodsGeneration of samplesThe principles of our resampling methodologywere discussed in detail in Chapter 6. Studies offixed sample sizes with <strong>randomised</strong> and <strong>non</strong><strong>randomised</strong>designs (described below) weregenerated for each region in each of the IST andECST data sets by selectively samplingparticipants, the whole process being repeated1000 times. In each trial baseline data onimportant prognostic variables had been recordedfor each participant at the point of recruitment.These variables (we will refer to them ascovariates) were used in the analyses to adjust fordifferences in case-mix. Details of the covariatesavailable for each study are given in Table 20 andAppendix 8.Samples with ‘naturally’ occurring biasesHistorically controlled and concurrently (<strong>non</strong><strong>randomised</strong>)controlled <strong>studies</strong> were generatedfrom the IST and ECST data sets as described inChapter 6. We thus obtained results for:1. 14,000 historically controlled <strong>studies</strong> basedon the 14 international regions from the ISTand 14,000 corresponding RCTs, all of samplesize 200 (100 per arm)2. 14,000 concurrently controlled <strong>studies</strong> basedon the 14 international regions from the ISTand 14,000 corresponding RCTs, all of samplesize 200 (100 per arm)3. 10,000 concurrently controlled <strong>studies</strong> basedon the 10 UK cities within the IST and 10,000corresponding RCTs, all of sample size 200(100 per arm)4. 8000 historically controlled <strong>studies</strong> based onthe eight international regions from the ECSTand 8000 corresponding RCTs, all of samplesize 80 (40 per arm)5. 8000 concurrently controlled <strong>studies</strong> based onthe eight international regions from the ECSTand 8000 corresponding RCTs, all of samplesize 80 (40 per arm).Different case-mix adjustment methods wereapplied individually to each of these 54,000 <strong>non</strong><strong>randomised</strong><strong>studies</strong>, and the results werecompared with the results from the corresponding54,000 RCTs.Samples with bias related to ‘known’ differencesin case-mix (allocation by indication)In addition to the standard historically andconcurrently controlled designs, for the purposesof evaluating the performance of case-mixadjustment we have included two further designsin which bias relates to known relationships withprognostic variables. This has been done for tworeasons. First, we wished to evaluate how well casemixmethods work in situations when we havedirect knowledge of the bias-inducing mechanismthat they are trying to correct. Second, we wishedto mimic crudely clinical database-type <strong>studies</strong>, in65© Queen’s Printer and Controller of HMSO 2003. All rights reserved.