Empirical estimates of bias associated with <strong>non</strong>-random allocation58results between <strong>non</strong>-<strong>randomised</strong> and <strong>randomised</strong><strong>studies</strong>. The bias was observed for the ECSThistorically controlled comparisons leading tooverestimates of the benefit of carotid surgery,both for individual regions and when the resultswere aggregated across regions. This pattern isconsistent with the conclusions of Sacks andcolleagues, 27 who noted in their review ofhistorically controlled <strong>studies</strong> for six medical<strong>intervention</strong>s that “biases in patient selectionmay irretrievably weight the outcome ofhistorically controlled <strong>studies</strong> in favour of newtherapies”.Systematic biases were also noted in some of thehistorically controlled <strong>studies</strong> in the individualregions in the IST analysis, but here they wereseen to vary in direction and magnitude,sometimes overestimating benefit and sometimesoverestimating harm.Systematic bias in historically controlled <strong>studies</strong>arises from there being time trends in theaverage outcomes of participants in a study,regardless of which treatment they receive.Details of the outcomes and characteristicsof the participants in the ECST are presentedin Tables 41 and 42 in Appendix 8. For fiveregions there was a reduction in the adverseevent rate of between 1 and 7% (averaged acrossboth treatment and control) between the trialperiods, whereas for three regions there was anincrease of between 1 and 14%. The changewas statistically significant (p < 0.01) in oneregion.How do such trends arise? There are a limitednumber of options: they must arise throughvariation over time in the case-mix, and henceprognosis, of participants recruited to the trial (asproposed by Sacks and colleagues 27 ), throughdifferences in other healthcare <strong>intervention</strong>s thatthe participants receive or through changingassessments of outcome. These variations maythemselves be haphazard or due to systematicmechanisms (such as changes in patient referraland recruitment or in patient management). Someof these potential causes may be measured, such asbaseline risk factors, but many may go unnoticedand are not assessed.Tables 39 and 42 in Appendix 8 show summariesof the distribution of important baseline riskfactors for IST and ECST, respectively. For bothtrials there were differences in the risk factors ofparticipants between the first and second halves ofthe trial, although the patterns of these differenceswere not consistent between regions, and it is notimmediately obvious how they relate to differencesin outcome. It seems likely that the differencesoccur in part due to unmeasured changes withinthe trials, but that there may also be differentmechanisms causing systematic bias in differentregions.Why should there be a time trend in outcome inthe ECST? Patients were only entered into thetrial when an investigator judged that in the caseof the individual patient there was uncertainty asto whether surgery would be beneficial orharmful. One possibility is therefore thatthroughout the very long recruitment period(12.5 years) investigators joined or left the trialwho had systematically different opinions on whowas suitable for randomisation. Six of the eightregions showed significant reductions (p < 0.05)in the proportion of patients recruited with
<strong>Health</strong> Technology Assessment 2003; Vol. 7: No. 27Unpredictability in biasWhen bias acts unpredictably, it will sometimeslead to an overestimation and sometimes tounderestimation of an effect. Although thesebiases may on average ‘cancel out’ across a set of<strong>studies</strong> such that no difference is observed inaverage ORs, the biases will still affect the resultsof individual <strong>studies</strong>. The presence of systematicbias may therefore be missed if the comparison ofresults is restricted to a comparison of averagevalues, as was done in five of the eight previousreviews summarised in Chapter 3. 25–28,32Unpredictable over- and underestimation willincrease the variability (or heterogeneity) of theresults of a set of <strong>studies</strong>. In the concurrentcomparisons such an increase in variability(measured by the standard deviation) wasobserved for the IST (Table 15), even though theaverage treatment effects in the concurrentlycontrolled and <strong>randomised</strong> <strong>studies</strong> were thesame. A similar pattern was observed forhistorically controlled <strong>studies</strong> generated from theIST when the haphazard within-region timetrends were aggregated in the overall analysis(Table 16).How do these biases occur, and how do they differfrom the variability seen between RCTs? Variabilityalways occurs between the results of multipleRCTs. The principal reason is the ‘play of chance’or sampling variation. A treatment effect observedin a particular RCT is unlikely to be the preciseeffect of the <strong>intervention</strong>. For example, randomlydividing the study sample into two does notguarantee that the groups are identical in allrespects, and the differences that do exist in casemixwill lead to either under- or overestimates ofthe treatment effect in an individual trial. We donot normally talk about these differences asbiases, but rather as uncertainties. We know thedistribution with which under-and overestimatesarise in RCTs, enabling us to draw correctinferences within specified degrees of certainty. Wecannot identify whether a particular trial isaffected by such bias, but we can calculate boundswithin which we are reasonably sure possible biasis encompassed, which we term confidenceintervals. Importantly, we know that the possibledifferences between the groups due to samplingvariation (and hence confidence intervals) reducewith increasing sample size.The extra variability we see in the <strong>non</strong><strong>randomised</strong><strong>studies</strong> arises in a similar but moretroubling manner. Rather than randomly dividinga single group of individuals, we start with twodifferent groups of individuals. We therefore startwith differences between the groups in measurableand unmeasurable factors. These potentiallyinclude differences in case-mix, additionaltreatments and methods of assessment ofoutcome. Importantly, in addition to not beingable to identify all these differences, we may notknow in which way many of the factors act, so thatthere is overall uncertainty as to whether they willcause under- or overestimates of the treatmenteffect. Sampling from these populationsintroduces the same sampling variation as in theRCT. While we can estimate the impact of thesampling variation (and calculate standardconfidence intervals), there is no mathematicalway of knowing how pre-existing differencesbetween the groups behave. It is therefore notpossible to include an allowance in the confidenceinterval for a single study that accounts for theextra uncertainty introduced through unsystematicbias. As we cannot mathematically allow for thisvariation when drawing conclusions, it isappropriate to call such extra variation ‘bias’ eventhough it is ‘uncertain’. In contrast to samplingvariation, the extra uncertainty is independent ofsample size as it is a feature of the pre-existingdifferences between the two populations fromwhich the samples were drawn.Our resampling <strong>studies</strong> provide a uniqueopportunity to calculate the distribution of thisextra uncertainty for the specific situations studiedin the IST and ECST by calculating the increase invariance seen with <strong>non</strong>-<strong>randomised</strong> concurrentlycontrolled <strong>studies</strong> compared with RCTs. Thiscomputation is possible as we ensured that foreach study the RCTs are the same size as theconcurrent comparisons, such that the differencesin variability cannot be explained by differences insampling variability. The results of thesecomputations are given in Table 19. The extravariance in log OR was 0.61 for regional ISTcomparisons, 0.57 for UK city IST comparisonsand 0.01 for regional ECST comparisons. Giventhese estimates, it is possible to calculate newadapted confidence intervals for these <strong>studies</strong> thatallow for these potential uncertain biases inaddition to sampling variation. They areexpressed in Table 19 as multiplicative increases inthe width of the standard confidence intervals. Assampling variability decreases with increasingsample size but the unsystematic bias remainsconstant, the ratio of the extra allowance in thewidth of the confidence interval due tounsystematic bias increases with sample size. Theratios presented in Table 19 reveal that standardconfidence intervals for many <strong>non</strong>-<strong>randomised</strong>59© Queen’s Printer and Controller of HMSO 2003. All rights reserved.