12.07.2015 Views

Considerations on Sample Size and Power Calculations in ...

Considerations on Sample Size and Power Calculations in ...

Considerations on Sample Size and Power Calculations in ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ISAKOS Scientific Committee Report<str<strong>on</strong>g>C<strong>on</strong>siderati<strong>on</strong>s</str<strong>on</strong>g> <strong>on</strong> <strong>Sample</strong> <strong>Size</strong> <strong>and</strong> <strong>Power</strong> Calculati<strong>on</strong>s <strong>in</strong>R<strong>and</strong>omized Cl<strong>in</strong>ical TrialsJ<strong>on</strong> Karlss<strong>on</strong>, M.D., Ph.D., Lars Engebretsen, M.D., Ph.D., <strong>and</strong> Katie Da<strong>in</strong>ty, M.Sc., C.R.P.C.Abstract: Many studies <strong>in</strong> orthopaedics <strong>and</strong> sports medic<strong>in</strong>e have not c<strong>on</strong>sidered sample size orstatistical power as important issues <strong>in</strong> study design. This article addresses the importance of a samplesize calculati<strong>on</strong> <strong>in</strong> r<strong>and</strong>omized cl<strong>in</strong>ical trials <strong>and</strong> the comp<strong>on</strong>ents of the calculati<strong>on</strong>s that researchersmust c<strong>on</strong>sider <strong>in</strong> their prelim<strong>in</strong>ary plann<strong>in</strong>g of an <strong>in</strong>vestigati<strong>on</strong>. The types of data be<strong>in</strong>g collected,level of significance, types I <strong>and</strong> II errors, <strong>and</strong> power are also addressed. Key Words: <strong>Sample</strong>size—<strong>Power</strong>—Type I error—Type II error—Level of significance—Effect size.Traditi<strong>on</strong>ally, many studies <strong>in</strong> orthopaedics <strong>and</strong>sports medic<strong>in</strong>e have not c<strong>on</strong>sidered sample sizeor statistical power as important issues <strong>in</strong> study design.It is therefore not surpris<strong>in</strong>g that many <strong>in</strong>vestigati<strong>on</strong>shave failed to answer the questi<strong>on</strong> asked for avariety of related reas<strong>on</strong>s, i.e., due to low power (mostoften too few subjects <strong>in</strong>cluded), a traditi<strong>on</strong>al type Istatistical error, or simply too few numbers. Goodsampl<strong>in</strong>g practice is essential <strong>in</strong> any <strong>in</strong>vestigati<strong>on</strong>, <strong>and</strong><strong>in</strong>cludes c<strong>on</strong>cern for the size of the sample used.Whole populati<strong>on</strong>s can rarely be measured <strong>and</strong>, therefore,decisi<strong>on</strong>s as to an adequate size of sample haveto be made. Although practical <strong>and</strong> ethical issuesmust be c<strong>on</strong>sidered, <strong>on</strong>e’s <strong>in</strong>itial c<strong>on</strong>siderati<strong>on</strong> whendeterm<strong>in</strong><strong>in</strong>g trial size should <strong>in</strong>clude the scientificrequirements of the study. Comm<strong>on</strong> sense will suggestthat the more subjects that are studied, the morelikely <strong>on</strong>e is to obta<strong>in</strong> a representative sample. TheFrom Sahlgrenska University Hospital/Ostra, Gothenburg, Sweden(J.K.); Bergslia 22, Oslo, Norway (L.E.); <strong>and</strong> Fowler KennedySport Medic<strong>in</strong>e Cl<strong>in</strong>ic, L<strong>on</strong>d<strong>on</strong>, Ontario, Canada (K.D.).Address corresp<strong>on</strong>dence <strong>and</strong> repr<strong>in</strong>t requests to Katie Da<strong>in</strong>ty,M.Sc., C.R.P.C., Fowler Kennedy Sport Medic<strong>in</strong>e Cl<strong>in</strong>ic, 3M Centre,University of Western Ontario, L<strong>on</strong>d<strong>on</strong>, Ontario N6A 3K7,Canada. E-mail: kda<strong>in</strong>ty@uwo.ca© 2003 by the Arthroscopy Associati<strong>on</strong> of North America0749-8063/03/1909-3889$30.00/0doi:10.1016/S0749-8063(03)00837-5layman’s formula that describes this relati<strong>on</strong>ship isbasically:The marg<strong>in</strong> of error <strong>in</strong> a sample 1 divided by thesquare root of the number of people <strong>in</strong> the sample. 1EFFECT SIZE OR TREATMENT SIZEThe sample size of a cl<strong>in</strong>ical trial depends <strong>on</strong> thesize of the difference to be detected between the 2groups. The larger the difference <strong>on</strong>e wishes to detect,the more patients will be required for the study. It isusually best to require that there is a sufficient numberof subjects to detect the m<strong>in</strong>imal “cl<strong>in</strong>ically” importantdifference to avoid the need for astr<strong>on</strong>omicallylarge numbers of patients. Apply<strong>in</strong>g the above rule, itmakes sense that studies <strong>in</strong> which str<strong>on</strong>g treatmenteffects <strong>and</strong> large differences between groups are expected,require fewer subjects.ALPHA ERROR () TYPE I ERRORThe sample size calculati<strong>on</strong> also <strong>in</strong>volves c<strong>on</strong>siderati<strong>on</strong>of the risk of an error (accept<strong>in</strong>g the treatmentis effective when it is not). The acceptable size for thisrisk is a value judgment, <strong>and</strong> can be as large as 1 or assmall as 0. When calculat<strong>in</strong>g the number of patientsrequired, the number will vary accord<strong>in</strong>g to the sizeof the chance the <strong>in</strong>vestigator is will<strong>in</strong>g to take ofArthroscopy: The Journal of Arthroscopic <strong>and</strong> Related Surgery, Vol 19, No 9 (November), 2003: pp 997-999997


998 J. KARLSSON ET AL.falsely c<strong>on</strong>clud<strong>in</strong>g that a given <strong>in</strong>terventi<strong>on</strong> is valuable.The larger the chance accepted, the fewer thepatients needed. As easy as this sounds, it does not domuch for the credibility of a trial.Alpha (p ) is usually set at 0.05 (1 <strong>in</strong> 20) or 0.01 (1<strong>in</strong> 100).BETA ERROR () TYPE II ERRORType II or error is similar to error, but it is thechance of miss<strong>in</strong>g true differences <strong>in</strong> a particularstudy, or <strong>in</strong> other words, accept<strong>in</strong>g a null hypothesiseven though it is false. errors are c<strong>on</strong>venti<strong>on</strong>allymuch larger than errors, reflect<strong>in</strong>g the higher valueplaced <strong>on</strong> be<strong>in</strong>g sure that an effect is really presentwhen <strong>on</strong>e reports that it is. 3LEVEL OF SIGNIFICANCEThe chosen level of significance sets the likelihoodof detect<strong>in</strong>g a treatment effect when n<strong>on</strong>e exists (lead<strong>in</strong>gto a so-called false-positive result), <strong>and</strong> def<strong>in</strong>es thethreshold P value. Results with a P value above thethreshold lead to the c<strong>on</strong>clusi<strong>on</strong> that an observeddifference may be due to chance al<strong>on</strong>e, while thosewith a P value below the threshold lead to reject<strong>in</strong>gchance <strong>and</strong> c<strong>on</strong>clud<strong>in</strong>g that the <strong>in</strong>terventi<strong>on</strong> has a realeffect. The level of significance is most comm<strong>on</strong>ly setat 5% (i.e., P .05) or 1% (P .01). This means thatthe <strong>in</strong>vestigator is prepared to accept a 5% (or 1%)chance of err<strong>on</strong>eously report<strong>in</strong>g a significant effect.POWER (1-)The power of a study is def<strong>in</strong>ed as its ability todetect a true difference <strong>in</strong> outcome between the st<strong>and</strong>ardor c<strong>on</strong>trol arm <strong>and</strong> the <strong>in</strong>terventi<strong>on</strong> arm. It is thepower to detect a statistical difference between 2different treatments, when the difference <strong>in</strong> fact exists.This can also be described as the possibility that thestudy shows a statistical significance (at a level decided<strong>on</strong> beforeh<strong>and</strong>) provided there is an effect. Thisis correlated to the possibility to f<strong>in</strong>d a cl<strong>in</strong>icallyrelevant effect for any given drug or treatment. Thismay be termed “cl<strong>in</strong>ical effect,” which is not alwaysthe same as statistical effect. Before decid<strong>in</strong>g <strong>on</strong> thenull hypothesis, a m<strong>in</strong>imal cl<strong>in</strong>ically important differenceshould be discussed. One must c<strong>on</strong>sider thiscarefully, s<strong>in</strong>ce “cl<strong>in</strong>ically significant difference” canbe a matter of great debate.<strong>Power</strong> is affected by many factors, namely the significancelevel, the effect size, <strong>and</strong> the sample size.The lower the significance level, the lower the power.If the researcher wishes to reduce the risk of c<strong>on</strong>clud<strong>in</strong>gthat an <strong>in</strong>effective medicati<strong>on</strong> (or any other treatment)is effective, then they must underst<strong>and</strong> thatthey, <strong>in</strong> turn, run the risk of miss<strong>in</strong>g a treatment whichis, <strong>in</strong> fact, effective. The larger effect a treatment has,the larger is the possibility to discover it with<strong>in</strong> thetrial. A generally accepted power is 80% or 0.8. Inmany recent studies, the power limit is set to 0.9 oreven 0.95. This is more often the case <strong>in</strong> pharmacologicstudies, i.e., test<strong>in</strong>g new drugs, than is the case <strong>in</strong><strong>in</strong>vestigati<strong>on</strong> of surgical techniques. A power of 0.8means that the researcher is correct 8 times <strong>in</strong> 10 whenaccept<strong>in</strong>g his c<strong>on</strong>clusi<strong>on</strong>. Often, a higher power isrequested <strong>in</strong> studies for dramatic new treatments suchas for diabetes or cancer. A power of 0.8 also meansthat the researcher may be <strong>in</strong>correct 2 times out of 10.The cl<strong>in</strong>ically relevant difference <strong>and</strong> the risk of committ<strong>in</strong>ga type II error should be taken <strong>in</strong>to c<strong>on</strong>siderati<strong>on</strong>when def<strong>in</strong><strong>in</strong>g the power limit desired.METHODOLOGY<strong>Sample</strong> size calculati<strong>on</strong>s should be performed dur<strong>in</strong>gthe design phase of the study. There are severalcomputerized methods that can be easily accessedthrough the Internet if statistical support is not at h<strong>and</strong>.However, the formula must be appropriate for thedesign of the trial <strong>and</strong> the subsequent analysis. Eachsample size calculati<strong>on</strong> is unique, <strong>and</strong> there is noglobal soluti<strong>on</strong> applicable to all trials.<strong>Sample</strong> <strong>Size</strong> Equati<strong>on</strong> for 2 <strong>in</strong>dependent groupsmeans:n/group 2{(Z Z ) /} 2Z z-value for found <strong>in</strong> a Z table (i.e., Z for 0.05 is 1.96).Z z-value for found <strong>in</strong> a Z table. st<strong>and</strong>ard deviati<strong>on</strong> <strong>in</strong> mean resp<strong>on</strong>se.m<strong>in</strong>imal difference desired <strong>in</strong> outcome.It is str<strong>on</strong>gly recommended that <strong>on</strong>e enlist the aid ofa biostatistitian at the beg<strong>in</strong>n<strong>in</strong>g to help determ<strong>in</strong>e thecorrect sample size calculati<strong>on</strong>s. However, it is alwaysa good idea to underst<strong>and</strong> the calculati<strong>on</strong> <strong>and</strong> why thestatistician did what he or she did.Th<strong>in</strong>gs to c<strong>on</strong>sider when determ<strong>in</strong><strong>in</strong>g sample size:1. The <strong>in</strong>vestigati<strong>on</strong> or analytical procedure that<strong>on</strong>e wishes to follow as part of the <strong>in</strong>itial experimentaldesign stage of the <strong>in</strong>vestigati<strong>on</strong>.


SAMPLE SIZE AND POWER CALCULATIONS IN RCT9992. The statistical procedure that <strong>on</strong>e may wish touse for analysis (e.g., 1-tailed or 2-tailed t test,l<strong>in</strong>ear regressi<strong>on</strong>).3. The percentage change <strong>and</strong>/or level of significanceat which <strong>on</strong>e would wish to detect thechange.4. The appropriate sample size formula for thestudy design <strong>in</strong> questi<strong>on</strong>.One can see why it is extremely useful to have thehelp of a statistician from the outset. F<strong>in</strong>ally, <strong>on</strong>eshould explicitly choose between 2-sided or 1-sidedstatistical test<strong>in</strong>g. Unfortunately, the importance ofthis decisi<strong>on</strong> is often neglected. In the 2-sided opti<strong>on</strong>,<strong>on</strong>e is able to detect the predef<strong>in</strong>ed relevant difference,if present, <strong>in</strong> both directi<strong>on</strong>s. That is, if the <strong>in</strong>terventi<strong>on</strong>is better than the reference of a given amount, thiswill probably be found, but also a similar advantage ofthe reference over the pr<strong>in</strong>cipal <strong>in</strong>terventi<strong>on</strong> can bedetected. The 1-sided opti<strong>on</strong> would <strong>on</strong>ly allow a1-sided evaluati<strong>on</strong> (e.g., whether or not the <strong>in</strong>terventi<strong>on</strong>is better than the reference treatment of a givenamount), without test<strong>in</strong>g whether the reference is superiorto the pr<strong>in</strong>cipal <strong>in</strong>terventi<strong>on</strong>. 4-6CONCLUSIONSIt can be c<strong>on</strong>cluded that the larger the sample size,the more precisi<strong>on</strong> <strong>on</strong>e will have when stat<strong>in</strong>g thef<strong>in</strong>d<strong>in</strong>gs. With <strong>in</strong>creased precisi<strong>on</strong>, there is an <strong>in</strong>creasedpossibility of establish<strong>in</strong>g an effect (if it trulyexists). In other words, it is possible to f<strong>in</strong>d even verysmall effects (e.g., differences between groups) if thesample size is large enough. However, it can be questi<strong>on</strong>edwhether it is correct, from an ethical perspective,to perform a very large <strong>in</strong>vestigati<strong>on</strong> (“the largerthe sample, the higher the power”) to identify an effectthat may be small <strong>and</strong> not cl<strong>in</strong>ically relevant. Thismay, <strong>in</strong> fact, be both <strong>in</strong>correct <strong>and</strong> dangerous, <strong>and</strong> <strong>on</strong>emust always remember that any new treatment may bepotentially harmful. <strong>Sample</strong> size calculati<strong>on</strong>, <strong>in</strong>clud<strong>in</strong>gpower analysis, should always be <strong>in</strong>cluded <strong>in</strong> themethodologic design of any cl<strong>in</strong>ical trial. Especiallywhen no statistical difference is found, the possibilityof either type I or type II error should be discussed. Infact, a study show<strong>in</strong>g no statistical difference is oftenmore the result of low power than to true lack ofdifference between the study groups.REFERENCES1. Friedman LM, Furberg CD, Demets DL. <strong>Sample</strong> size. In:Fundamentals of cl<strong>in</strong>ical epidemiology. New York: Mosby,1996;94-125.2. Pocock SJ. Cl<strong>in</strong>ical trials, a practical approach. Chichester:Wiley, 1983.3. Bl<strong>and</strong> JM, Altman DG. One <strong>and</strong> two sided tests of significance.BMJ 1994;309:248.4. Dunnett CW, Gent M. An alternative to the use of two-sidedtests <strong>in</strong> cl<strong>in</strong>ical trials. Stat Med 1996;15:1729-1738.5. Koch GG. One-sided <strong>and</strong> two-sided tests <strong>and</strong> P values. J BiopharmStat 1991;1:161-170.6. Peace KE. The alternative hypothesis: One-sided or two-sided?J Cl<strong>in</strong> Epidemiol 1989;42:473-476.RECOMMENDED READING1. Greenfield ML, Kuhn JE, Wojtys EM. A statistics primer.<strong>Power</strong> analysis <strong>and</strong> sample size determ<strong>in</strong>ati<strong>on</strong>. Am J Sports Med1997;25:138-140.2. Lach<strong>in</strong> JM. Introducti<strong>on</strong> to sample size determ<strong>in</strong>ati<strong>on</strong> <strong>and</strong>power analysis for cl<strong>in</strong>ical use. C<strong>on</strong>trol Cl<strong>in</strong> Trials 1981;2:93-113.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!