Supplementary Web Material - Biometrics
Supplementary Web Material - Biometrics
Supplementary Web Material - Biometrics
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Web</strong>-based supplementary materials for Calculating sample size<br />
for studies with expected all-or-none nonadherence and selection<br />
bias by Michelle D. Shardell and Samer S. El-Kamary<br />
May 15, 2008<br />
<strong>Web</strong> Appendix: Sample size calculations using exter-<br />
nal or internal pilot data that accommodate noncom-<br />
pliance<br />
Sample size formulas and test statistics<br />
FHF proposed methods for calculating sample size for normal and Poisson F while ac-<br />
counting for variability of existing data. The approach of calculating “calibrated power”<br />
for normal F with assumed equal variances by integrating out the variance estimator in<br />
FHF Section 3 was motivated by external pilot data when ρc = 1. Many trials do not have<br />
preliminary data, including compliance information, at the outset. Thus, one approach<br />
for calculating sample size is to perform an unblinded internal pilot study as a source for<br />
deriving assumptions about means, variances, and compliance. Interestingly, the critical<br />
value for calibrated power in FHF for external pilot studies equals the exact “inflation<br />
factor” in Zucker et al. (1999 p. 3502) needed for internal pilot studies (also calculated<br />
by integrating out the variance estimator). To circumvent the numerical integration re-<br />
quired for calibrated power/exact inflation factors, methods have been proposed utilizing<br />
1
approximate inflation factors in sample size calculations when ρc = 1 under the assump-<br />
tion of equal variances between groups (Wittes et al., 1999; Zucker et al., 1999; Miller,<br />
2005). These authors also proposed adaptations of the t-test when internal pilot data are<br />
collected to avoid elevated type I errors. However, no previous work has accommodated<br />
noncompliance in this context.<br />
In this <strong>Web</strong> Appendix, we adapt previously proposed approximate sample size formulas<br />
for external and internal pilot data to accommodate noncompliance and unequal variances.<br />
In the case of internal pilot studies, we also adapt two test statistics to accommodate<br />
noncompliance. The first method adapted is the second-segment Stein (1945) (SS) method<br />
described in Zucker et al. (1999). The second method adapted is the approach by Miller<br />
(2005) in which bounded bias (BB) of the variance estimate is accommodated in the<br />
sample size formula and t-test.<br />
Assuming equal variances in both groups, ρc = 1, and r = 1, the original Stein (1945)<br />
approach involved calculating<br />
N = 2{(tα/2,2(np−1) + tβ,2(np−1))/δ} 2 S 2 p<br />
+ 1, (1)<br />
where S 2 p is the pooled variance estimate calculated from the pilot data, tq,ν is the 1−q<br />
quantile of the t distribution with ν degrees of freedom, and np is the sample size per<br />
group in the pilot study. Thus, the approximate inflation factor used in (1) to account for<br />
variance uncertainty is {(tα/2,2(np−1) +tβ,2(np−1))/(zα/2 +zβ)} 2 , whereas the exact inflation<br />
factor corresponding to calibrated power in FHF is {(zα/2 + zβ∗)/(zα/2 + zβ)} 2 , where<br />
1 − β∗ is calibrated power (Zucker et al., 1999).<br />
If the pilot study is internal, then ns = N − np additional observations per group are<br />
collected, and the test statistic is then tStein = � N/2( ¯ Y1 − ¯ Y0)/Sp, to be compared with<br />
tα/2,2(np−1). The SS procedure in Zucker et al. (1999) uses (1), but with a different test<br />
2
statistic, and can only be used if some minimum sample size for the second segment (i.e.,<br />
ns) is imposed. Let Dz = ( ¯ Yzs − ¯ Yzp) be the mean observed study outcome difference<br />
in group z of the second segment data compared to the pilot data. Further, Let S 2 s<br />
be the pooled variance estimate of the second-segment data. The test statistic is then<br />
tSS = � N/2( ¯ Y1 − ¯ Y0)/SSS to be compared to tα/2,2ns, where<br />
S 2 SS<br />
�<br />
−1<br />
= (2ns) 2(ns − 1)S 2 npns<br />
s +<br />
N (D2 1 + D2 0 )<br />
�<br />
.<br />
Before describing our adaptation of the SS approach, we describe the BB method by<br />
Miller (2005). Assuming equal variances in both groups, ρc = 1, and r = 1, the approach<br />
involved calculating v = 2{(Zα/2 +Zβ)/δ} 2 , and setting N = max{vS 2 p +1, np +ns(min)},<br />
where ns(min) ≥ 0 is an arbitrary minimum for the second segment. Let S 2 be the pooled<br />
variance estimate from all of the study data. Miller (2005) showed that − np−1<br />
(np−2)v ≤<br />
E(S 2 ) − σ 2 ≤ 0, where σ 2 is the true variance for both groups. The proposed correction<br />
for this bias is the variance estimator S 2 BB = S2 + np−1<br />
(np−2)v 1{N>np+n s(min)}, where 1{·} is the<br />
indicator function, with test statistic tBB = � N/2( ¯ Y1 − ¯ Y0)/SBB, compared to tα/2,2(N−1).<br />
We consider new sample-size formulas for internal and external pilot studies that<br />
adapt (1) and Miller (2005) approaches. We also propose two adaptations of the t-<br />
test that extend tSS and tBB for internal pilot studies. The extensions of the sample<br />
size calculation involve 1) using the t inflation factor like in (1), 2) letting treatment-<br />
group variances and sample sizes differ (and using the Welch approximation for degrees<br />
of freedom), and 3) incorporating compliance-group proportion estimates. We calculate<br />
a noncompliance corrected v, vnc = {(tα/2,νp + tβ,νp)/δˆρcp} 2 , where νp is the degrees of<br />
freedom calculated with the Welch approximation using the pilot data, and ˆρcp is the<br />
estimated proportion of compliers calculated from the pilot data.<br />
For external pilot studies, the sample size formula for the control group is<br />
3
N0 = vnc(S 2 0p + S2 1p /r) + 1. (2)<br />
For internal pilot studies, we propose a formula that produces restricted designs, i.e.,<br />
N0 > n0p (Wittes et al., 1999):<br />
�<br />
N0 = max vnc(S 2 0p + S2 1p /r) + 1, n0p<br />
�<br />
+ n0s(min) , (3)<br />
where S 2 zp is the sample variance of group z calculated from the pilot data, nzp is the<br />
sample size of the pilot data in group z, and n0s(min) is an arbitrary minimum sample<br />
size for the control group second segment. We assume that the pre-specified ratio r is<br />
maintained for both stages of the study so that n1p/n0p = n1s(min)/n0s(min) = r.<br />
Our adaptation of tSS for internal pilot studies involves deriving group-specific variance<br />
estimates. Let<br />
S 2 zSS<br />
�<br />
−1<br />
= (nzs) (nzs − 1)S 2 zs<br />
nzpnzs<br />
+ D<br />
Nz<br />
2 �<br />
z .<br />
The noncompliance SS (NSS) test statistic is then tNSS = (S 2 0SS /N0+S 2 1SS /N1) −1/2 ( ¯ Y1−<br />
¯Y0), compared to tα/2,νNSS , where<br />
νNSS =<br />
(S2 0SS /N0 + S2 2<br />
1SS /N1)<br />
(S 2 0SS /N0) 2 /n0s + (S 2 1SS /N1) 2 /n1s<br />
Note that nzs are the degrees of freedom for S 2 zSS and were hence used in νNSS instead<br />
of nzs − 1.<br />
Our adaptation of tBB involves accounting for the potential bias in two variance es-<br />
timates for the special case of r = 1, thus n1p = n0p = np, n1s = n0s = ns, n1s(min) =<br />
n0s(min) = ns(min), and N1 = N0 = N. We calculate the variance<br />
4<br />
.
S 2 0BB + S2 1BB = S2 0 + S2 1 + 2 np − 1<br />
where the term −2 np−1<br />
(νp−2)vnc<br />
(νp − 2)vnc<br />
1{N>np+n s(min)}, (4)<br />
is an approximate bound for bias of S 2 0 + S 2 1 based on<br />
the proof of theorem 1 in Miller (2005) using the Welch approximation of S 2 0p + S 2 1p to a<br />
chi-square random variable. The derivation of (4) is found in the subsection below. The<br />
noncompliance BB (NBB) test statistic is then tNBB = (S 2 0BB /N +S2 1BB /N)−1/2 ( ¯ Y1 − ¯ Y0),<br />
compared to tα/2,νNBB , where<br />
νNBB =<br />
(S 2 0BB /N + S2 1BB /N)2<br />
(S 2 0BB /N)2 /(N − 1) + (S 2 1BB /N)2 /(N − 1) .<br />
The proposed sample size formulas involve adapting the approximate inflation factor<br />
in (1) to handle unequal variances that may be due to noncompliance with selection bias.<br />
However, use of the Welch approximation in sample size calculations and both tests in<br />
the presence of noncompliance is not entirely accurate. When ρc = 1, the Welch degrees<br />
of freedom are used to approximate the distribution of the variance of mean differences, a<br />
sum of weighted chi-square random variables, to a chi-square random variable. However,<br />
in the presence of noncompliance, the variance of mean differences is a sum of weighted<br />
central and noncentral chi-square random variables, where the noncentrality parameters<br />
depend on true adherence-subgroup means and compliance group distribution. The source<br />
of this inaccuracy is a special case of averaging over a factor causing heterogeneity in the<br />
treatment-specific outcome distribution. Even in cases with ρc = 1, other factors not<br />
considered in the sample size calculation (e.g., sex, age, etc.) are averaged over that<br />
may also cause heterogeneity in the outcome distribution. A closed-form approximate<br />
method that is reasonably accurate in accounting for heterogeneity from noncompliance<br />
is of practical utility, thus it is of interest to evaluate the performance of the proposed<br />
procedures for different levels of noncompliance and selection bias.<br />
5
Derivation of Bounded Bias Expression (4)<br />
This subsection can be skipped without loss of continuity. From the work of Wittes<br />
et al. (1999), the bias of S2 z for internal pilot studies with r = 1 can be expressed as<br />
E(S2 z − σ2 �<br />
(np−1)(S2 zp−σ<br />
z) = E<br />
2 � �<br />
z)<br />
= (np − 1)cov S np+ns−1<br />
2 �<br />
1<br />
zp, , because ns is a random<br />
np+ns−1<br />
variable. Thus,<br />
E{S 2 0 + S2 1 − (σ2 0 + σ2 1 )} = (np − 1)cov<br />
�<br />
S 2 0p + S2 1p ,<br />
�<br />
σ2 0 +σ<br />
where we approximate E<br />
2 1<br />
S2 0p +S2 �<br />
1p<br />
⎡<br />
⎢<br />
= (np − 1)cov ⎢<br />
⎣S2 0p + S2 1p ,<br />
⎧<br />
⎨<br />
> (np − 1)cov<br />
⎩ S2 0p + S 2 1p,<br />
1<br />
np + ns − 1<br />
�<br />
�<br />
v(S2 0p +S<br />
max<br />
2 1p )<br />
2ρ2 , np + ns(min) − 1<br />
c<br />
1<br />
v(S 2 0p +S2 1p )<br />
2ρ 2 c<br />
= np − 1<br />
v/(2ρ2 � � 2 σ0 + σ<br />
1 − E<br />
c)<br />
2 1<br />
S2 0p + S2 �<br />
1p<br />
�<br />
≈ np − 1<br />
v/(2ρ2 �<br />
· 1 −<br />
c ) νp<br />
�<br />
,<br />
νp − 2<br />
⎫<br />
⎬<br />
⎭<br />
1<br />
⎤<br />
⎥<br />
�⎥<br />
⎦<br />
by E � � νp<br />
, where X is a chi-square random variable<br />
X<br />
with νp degrees of freedom. Therefore, E(X −1 ) = 1/(νp − 2). The inequality follows from<br />
Lemma A.1 in Miller (2005). Estimating v/(2ρ 2 c) with vnc completes the derivation.<br />
6
Simulation study<br />
We assess the performance of our proposed sample size calculations and tests via extensive<br />
simulation studies under a variety of specifications for ρc, selection bias, and variance<br />
inequality. In all specifications, the sample size is calculated to detect δ = 0.5 with 80%<br />
power, where µc1 = 0.5, µc0 = 0, and σ 2 c0 = σ2 n<br />
= 1, using two-sided tests with α = 0.05.<br />
Pilot studies were simulated with r = 1, n0p = n1p = 10, 20, 30, 50. For internal pilot<br />
studies, the sample size was calculated using (3), where n0s(min) = 10. For external pilot<br />
studies, (2) was used to calculate sample size. For each specification, 100,000 ‘studies’<br />
were simulated to estimate empirical power and type I error of the naive t-test. For<br />
internal pilot studies, tNSS and tNBB were also evaluated.<br />
The empirical type I error for the approaches with internal pilot data is found in <strong>Web</strong><br />
Table 1. We see that both tNSS and tNBB are more conservative and control type I error<br />
better than the naive t-test. However, like in Wittes et al. (1999) where ρc = 1, when nzp<br />
and nzs are large, empirical type I error is approximately unbiased for the naive t-test. The<br />
largest biases occurred when σ 2 c1 �= σ2 a . The tNSS and tNBB statistics perform similarly to<br />
each other. The empirical power (when µc1 = 0.5 and µc0 = 0) for the approaches with<br />
internal pilot data is found in <strong>Web</strong> Table 2. As in Wittes et al. (1999), when nzp is a<br />
small percentage of the required sample size, the study is under-powered. In particular,<br />
empirical power is lowest in the specification with ρc = 0.20, but improves with increasing<br />
nzp. As in Zucker et al. (1999) where ρc = 1, tNSS performs well for attaining adequate<br />
power when nzp is sufficient, at least 10% of the required sample size for the specifications<br />
explored in this study, regardless of selection bias. Again, the performances of tNSS and<br />
tNBB are similar.<br />
The empirical type I error and power of the naive t-test for external pilot studies are<br />
found in <strong>Web</strong> Table 3. We see that the approach performs well, and achieves close to<br />
7
nominal power when nzp = 50 for all specifications studied except when ρc < 0.5 (i.e., the<br />
specifications requiring the largest sample size). We also see that, like with internal pilot<br />
studies, the median required sample size per group decreases with increasing nzp. This<br />
result reflects that when nzp is larger compared to when it is smaller, inflation factors<br />
are closer to one owing to larger degrees of freedom in (2) and more precise estimates of<br />
variance and ρc from pilot data.<br />
Conclusion<br />
Although more accurate approximations for the distribution of a sum of weighted noncen-<br />
tral chi-square variables have been proposed (e.g., Castaño-Martínez and López-Blázquez,<br />
2005), their mathematical complexity limits their use and practicality for calculating sam-<br />
ple size. Further, the simulation studies show that using the Welch approximation for<br />
degrees of freedom in the inflation factor and resulting t-tests performs reasonably well<br />
for controlling type I error and attaining desired power in the presence of noncompliance.<br />
The main factor that impacts the performance of the methods is the ratio of nzp to the<br />
total required sample size. If ρc is expected to be small, then nzp should be chosen to<br />
have a sufficient number in each adherence subgroup to provide a reasonable estimate of<br />
subgroup means and variances for precisely estimating treatment-group variances, partic-<br />
ularly when large levels of selection bias are expected.<br />
Zucker et al. (1999) discussed concerns about jeopardizing blinding for internal pi-<br />
lot studies that are relevant here. In particular, unblinding may risk interim testing in<br />
addition to sample size calculation, which further impacts type I error. The authors<br />
noted that interim sample size calculation without testing can be allowed by disclosing<br />
treatment-group variance estimates, but not interim mean differences. To adapt this rec-<br />
ommendation to allow noncompliance, the interim compliance proportion estimate also<br />
8
needs to be revealed.<br />
References<br />
Castaño-Martínez, A. and López-Blázquez, F. (2005). Distribution of a sum of weighted<br />
noncentral chi-square variables. Test 14, 397-415.<br />
Miller, F. (2005). Variance estimation in clinical studies with interim sample size reesti-<br />
mation. <strong>Biometrics</strong> 61, 355-361.<br />
Stein, C. (1945). A two-sample test for a linear hypothesis whose power is independent of<br />
the variance. Annals of Mathematical Statistics 16, 243-258.<br />
Wittes, J., Schabenberger, O., Zucker, D., Brittain, E., and Proschan, M. (1999). Internal<br />
pilot studies I: Type I error rate of the naive t-test. Statistics in Medicine 18, 3481-<br />
3491.<br />
Zucker, D.M., Wittes, J.T., Schabenberger, O., and Brittain, E. (1999). Internal pilot<br />
studies II: Comparison of various procedures. Statistics in Medicine 18, 3493-3509.<br />
9
10<br />
Table 1: Comparison of empirical type I error to nominal type I error (5%) with sample sizes calculated using Equation<br />
3 with internal pilot data to detect δ = 0.5 with µc1 = 0.5 and µc0 = 0 where σ 2 c0 = σ2 n = 1, and ρn = ρa = (1 − ρc)/2.<br />
Two groups compared using the naive t-test, tNSS, and tNBB. N = median sample size per group. Monte Carlo standard<br />
error = 0.06% from 100,000 iterations.<br />
nzp = 10 nzp = 20 nzp = 30 nzp = 50<br />
ρc µa µn σ 2 c1 σ2 a N t tNSS tNBB N t tNSS tNBB N t tNSS tNBB N t tNSS tNBB<br />
0.20 0.00 0.00 1.00 1.00 1671 5.06 5.05 5.05 1618 5.13 5.13 5.12 1586 4.95 4.93 4.94 1584 5.02 5.00 5.01<br />
0.30 0.00 0.00 1.00 1.00 743 5.04 5.01 5.02 718 4.99 4.96 4.97 719 5.06 5.03 5.04 708 5.02 4.99 5.00<br />
0.50 0.00 0.00 1.00 1.00 271 5.07 5.01 5.02 261 5.12 5.08 5.06 257 5.09 5.05 5.04 256 5.03 4.94 4.98<br />
0.75 0.00 0.00 1.00 1.00 122 5.13 4.98 4.99 117 5.18 5.07 5.09 116 5.18 5.09 5.08 120 5.08 5.04 4.99<br />
0.50 -0.25 0.00 1.00 1.00 274 5.00 4.97 4.97 264 5.06 4.98 5.00 260 5.10 5.06 5.06 258 5.05 5.00 5.00<br />
0.50 -0.25 0.25 1.00 1.00 278 5.13 5.06 5.08 268 5.15 5.09 5.10 266 5.03 4.98 4.99 263 5.05 5.00 4.99<br />
0.50 0.25 0.25 1.00 1.00 275 5.00 4.93 4.96 264 5.09 5.04 5.04 262 5.05 5.00 5.00 260 5.05 5.02 5.00<br />
0.50 0.25 -0.25 1.00 1.00 280 4.99 4.96 4.94 268 5.05 5.02 5.01 266 5.10 5.04 5.04 263 4.98 4.94 4.94<br />
0.50 0.50 -0.25 1.00 1.00 289 5.07 5.01 5.02 279 5.17 5.11 5.11 277 5.00 4.96 4.97 275 5.02 4.97 4.99<br />
0.50 0.25 -0.50 1.00 1.00 290 5.02 4.97 4.97 279 5.03 4.98 4.97 277 4.92 4.88 4.88 274 4.85 4.83 4.82<br />
0.50 0.50 -0.50 1.00 1.00 304 5.09 5.03 5.04 293 5.07 5.02 5.03 290 4.98 4.93 4.93 288 5.07 5.03 5.03<br />
0.50 0.50 0.50 1.00 1.00 287 5.03 4.95 4.98 276 4.98 4.92 4.93 275 5.08 5.04 5.03 272 5.03 4.98 4.98<br />
0.50 0.00 0.00 1.50 1.00 304 5.02 4.97 4.97 293 5.11 5.07 5.06 290 4.86 4.82 4.82 287 4.97 4.92 4.93<br />
0.50 0.00 0.00 1.50 1.50 337 5.04 5.00 5.00 324 4.98 4.93 4.95 321 4.90 4.86 4.85 327 5.12 5.08 5.09<br />
0.50 0.00 0.00 0.75 1.00 255 5.01 4.95 4.96 244 5.07 5.01 5.00 241 5.11 5.05 5.07 247 5.06 4.99 5.01<br />
0.50 0.00 0.00 0.75 0.75 240 5.12 5.06 5.07 231 5.05 4.98 4.99 229 5.10 5.03 5.03 239 5.07 5.04 5.02<br />
0.50 0.00 0.00 2.00 1.00 338 5.01 4.94 4.95 324 4.97 4.92 4.94 321 5.10 5.02 5.04 319 5.20 5.16 5.16<br />
0.50 0.00 0.00 2.00 2.00 404 5.07 5.04 5.03 390 4.99 4.94 4.95 385 4.97 4.91 4.93 384 4.99 4.95 4.95<br />
0.50 0.00 0.00 0.50 1.00 237 5.02 4.96 4.96 227 5.17 5.11 5.10 225 5.08 4.98 5.02 224 5.07 5.01 5.02<br />
0.50 0.00 0.00 0.50 0.50 204 5.02 4.93 4.94 195 5.08 5.03 5.00 194 5.04 4.96 4.97 192 5.07 5.02 5.00<br />
0.50 0.25 -0.25 0.75 0.75 245 5.01 4.94 4.95 235 5.12 5.06 5.06 234 5.00 4.97 4.94 240 5.02 4.99 4.97<br />
0.50 0.25 -0.50 1.50 1.50 357 5.07 5.01 5.02 344 5.04 4.98 5.00 339 4.86 4.85 4.83 350 4.89 4.86 4.85<br />
0.50 0.50 -0.50 2.00 2.00 438 4.92 4.87 4.88 420 4.93 4.89 4.89 418 5.01 4.99 4.98 423 4.92 4.89 4.89<br />
0.50 0.50 -0.50 0.50 0.50 237 5.07 5.00 5.00 227 5.10 4.99 5.03 226 5.06 5.00 5.00 231 5.02 4.94 4.96<br />
0.50 0.50 -0.50 2.00 0.50 339 5.13 5.08 5.09 326 5.08 5.03 5.04 322 5.02 4.99 4.98 327 5.23 5.21 5.20
11<br />
Table 2: Comparison of empirical power to nominal power (80%) from studies with sample sizes calculated using Equation<br />
3 with internal pilot data to detect δ = 0.5 with µc1 = 0.5 and µc0 = 0 where σ 2 c0 = σ2 n = 1, and ρn = ρa = (1 − ρc)/2.<br />
Two groups compared using the naive t-test, tNSS, and tNBB. N = median sample size per group. Monte Carlo standard<br />
error = 0.13% from 100,000 iterations.<br />
nzp = 10 nzp = 20 nzp = 30 nzp = 50<br />
ρc µa µn σ 2 c1 σ2 a N t tNSS tNBB N t tNSS tNBB N t tNSS tNBB N t tNSS tNBB<br />
0.20 0.00 0.00 1.00 1.00 1685 69.35 69.35 69.32 1643 72.30 72.28 72.27 1628 73.98 73.96 73.95 1627 75.60 75.56 75.57<br />
0.30 0.00 0.00 1.00 1.00 761 73.65 73.57 73.66 739 75.83 75.75 75.78 732 76.81 76.74 76.76 726 78.03 77.96 77.97<br />
0.50 0.00 0.00 1.00 1.00 278 78.02 77.82 77.89 269 79.10 78.84 78.97 266 79.73 79.49 79.60 264 80.24 79.99 80.14<br />
0.75 0.00 0.00 1.00 1.00 125 81.02 80.60 80.67 120 81.06 80.46 80.84 118 81.32 80.75 81.08 120 81.11 80.26 80.93<br />
0.50 -0.25 0.00 1.00 1.00 285 77.77 77.58 77.64 275 78.52 78.33 78.42 273 79.39 79.16 79.28 271 79.72 79.46 79.62<br />
0.50 -0.25 0.25 1.00 1.00 287 77.57 77.39 77.42 277 78.72 78.48 78.58 274 79.14 78.94 79.05 272 79.62 79.35 79.52<br />
0.50 0.25 0.25 1.00 1.00 275 77.85 77.63 77.70 264 79.09 78.85 78.95 262 79.55 79.31 79.40 259 80.21 79.95 80.08<br />
0.50 0.25 -0.25 1.00 1.00 287 78.65 78.47 78.53 276 79.73 79.50 79.61 274 80.32 80.10 80.21 271 80.98 80.72 80.88<br />
0.50 0.50 -0.25 1.00 1.00 295 78.57 78.35 78.45 284 79.83 79.62 79.73 281 80.78 80.54 80.67 278 81.09 80.82 81.00<br />
0.50 0.25 -0.50 1.00 1.00 306 78.64 78.44 78.51 291 79.89 79.66 79.79 288 80.83 80.53 80.71 286 81.35 81.13 81.25<br />
0.50 0.50 -0.50 1.00 1.00 314 78.55 78.34 78.43 300 80.04 79.78 79.91 297 80.85 80.62 80.76 295 81.61 81.35 81.49<br />
0.50 0.50 0.50 1.00 1.00 277 78.19 77.99 78.04 268 79.16 78.93 79.04 265 79.90 79.64 79.80 263 79.89 79.62 79.79<br />
0.50 0.00 0.00 1.50 1.00 315 77.76 77.60 77.64 301 79.08 78.87 78.96 297 79.59 79.39 79.48 296 80.05 79.87 79.95<br />
0.50 0.00 0.00 1.50 1.50 346 77.71 77.54 77.60 332 78.86 78.70 78.77 329 79.44 79.26 79.32 326 80.01 79.84 79.94<br />
0.50 0.00 0.00 0.75 1.00 263 78.03 77.80 77.89 252 79.25 78.99 79.12 249 79.68 79.43 79.56 247 80.58 80.27 80.46<br />
0.50 0.00 0.00 0.75 0.75 253 78.70 78.49 78.56 243 79.63 79.36 79.49 240 80.06 79.77 79.92 239 80.78 80.46 80.67<br />
0.50 0.00 0.00 2.00 1.00 346 77.89 77.74 77.78 334 78.84 78.68 78.75 330 79.48 79.32 79.39 328 79.91 79.76 79.82<br />
0.50 0.00 0.00 2.00 2.00 413 77.67 77.53 77.59 399 78.68 78.55 78.60 393 79.34 79.21 79.28 389 79.90 79.71 79.83<br />
0.50 0.00 0.00 0.50 1.00 246 78.26 77.97 78.08 236 79.38 79.06 79.24 233 79.88 79.56 79.76 231 80.29 79.94 80.17<br />
0.50 0.00 0.00 0.50 0.50 212 78.13 77.81 77.93 204 79.30 78.94 79.12 202 80.08 79.73 79.94 200 80.77 80.42 80.61<br />
0.50 0.25 -0.25 0.75 0.75 252 78.32 78.08 78.19 243 79.93 79.65 79.80 241 80.50 80.21 80.38 240 81.35 81.08 81.22<br />
0.50 0.25 -0.50 1.50 1.50 371 78.18 78.01 78.07 358 79.63 79.47 79.54 353 80.27 80.07 80.17 349 80.71 80.52 80.63<br />
0.50 0.50 -0.50 2.00 2.00 447 78.42 78.27 78.32 430 79.53 79.39 79.46 425 80.18 80.03 80.12 422 80.74 80.55 80.66<br />
0.50 0.50 -0.50 0.50 0.50 246 79.08 78.76 78.91 235 80.70 80.34 80.53 234 81.50 81.12 81.37 232 82.39 82.03 82.26<br />
0.50 0.50 -0.50 2.00 0.50 347 78.89 78.71 78.77 333 79.96 79.77 79.86 330 80.39 80.20 80.29 326 81.42 81.21 81.34
12<br />
Table 3: Comparison of empirical Type I error (αe), and power, (1−βe), to nominal Type I error (5%) and power (80%),<br />
respectively, from studies with sample sizes calculated using Equation 2 with external pilot data to detect δ = 0.5 with<br />
µc1 = 0.5 and µc0 = 0 where σ 2 c0 = σ2 n = 1, and ρn = ρa = (1 − ρc)/2. Two groups compared using the naive t-test.<br />
NH0 = median sample size when the null hypothesis is true; NH1 = median sample size when the alternative hypothesis<br />
is true. Monte Carlo standard error = 0.06% for αe and 0.13% for 1 − βe from 100,000 iterations.<br />
nzp = 10 nzp = 20 nzp = 30 nzp = 50<br />
ρc µa µn σ 2 c1 σ2 a NH0 αe NH1 1−βe NH0 αe NH1 1−βe NH0 αe NH1 1−βe NH0 αe NH1 1−βe<br />
0.20 0.00 0.00 1.00 1.00 1676 5.10 1714 69.67 1651 4.94 1620 72.10 1633 5.04 1601 73.38 1582 5.01 1616 75.10<br />
0.30 0.00 0.00 1.00 1.00 740 4.99 760 73.53 738 4.96 720 75.49 731 4.86 712 76.41 702 4.93 721 77.78<br />
0.50 0.00 0.00 1.00 1.00 270 4.83 270 77.49 268 5.03 260 78.56 265 5.09 265 78.96 254 4.86 262 79.72<br />
0.75 0.00 0.00 1.00 1.00 121 4.96 121 79.90 118 5.04 116 80.40 117 4.98 117 80.31 114 5.01 116 80.66<br />
0.50 -0.25 0.00 1.00 1.00 275 4.91 275 77.81 274 5.12 262 78.59 272 5.06 272 79.04 257 4.97 269 79.75<br />
0.50 -0.25 0.25 1.00 1.00 278 5.05 278 77.68 275 4.98 267 78.59 272 4.89 272 78.78 263 5.01 271 79.38<br />
0.50 0.25 0.25 1.00 1.00 274 5.01 274 77.64 264 4.95 263 78.74 260 5.08 260 79.02 259 4.88 258 79.55<br />
0.50 0.25 -0.25 1.00 1.00 279 5.10 279 77.42 276 4.98 268 78.43 271 4.91 271 79.06 263 5.20 271 79.49<br />
0.50 0.50 -0.25 1.00 1.00 289 5.00 289 77.45 282 4.99 278 78.54 279 4.92 279 78.93 274 4.96 278 79.52<br />
0.50 0.25 -0.50 1.00 1.00 289 4.91 289 77.38 290 5.00 278 78.35 288 4.93 288 78.80 273 4.93 285 79.57<br />
0.50 0.50 -0.50 1.00 1.00 304 4.95 304 77.27 301 4.98 293 78.43 296 5.10 296 78.91 287 5.04 295 79.38<br />
0.50 0.50 0.50 1.00 1.00 286 4.97 286 77.45 267 4.93 275 78.54 264 5.07 264 78.97 270 4.99 262 79.50<br />
0.50 0.00 0.00 1.50 1.00 303 4.99 311 77.70 299 5.11 291 78.74 297 4.97 288 79.10 287 5.06 294 79.49<br />
0.50 0.00 0.00 1.50 1.50 336 4.93 345 77.49 323 4.93 323 78.49 328 5.03 320 78.98 325 5.01 326 79.41<br />
0.50 0.00 0.00 0.75 1.00 252 5.03 260 77.42 243 4.89 243 78.33 248 4.95 240 79.16 246 4.86 247 79.50<br />
0.50 0.00 0.00 0.75 0.75 240 4.85 240 77.79 230 4.92 230 78.48 238 4.92 226 78.82 238 5.00 238 79.47<br />
0.50 0.00 0.00 2.00 1.00 338 4.90 347 77.85 332 5.06 324 78.67 329 4.99 321 78.93 318 5.03 326 79.55<br />
0.50 0.00 0.00 2.00 2.00 402 5.02 410 77.31 397 4.94 389 78.60 394 5.00 386 79.18 381 4.96 390 79.48<br />
0.50 0.00 0.00 0.50 1.00 236 5.04 244 77.04 235 4.99 227 78.16 232 5.09 224 78.83 223 4.93 231 79.40<br />
0.50 0.00 0.00 0.50 0.50 202 5.01 211 77.52 203 4.99 194 78.44 201 5.01 193 79.01 191 5.03 199 79.91<br />
0.50 0.25 -0.25 0.75 0.75 244 4.90 252 77.20 234 5.00 234 78.68 240 5.02 232 79.06 239 5.16 239 79.40<br />
0.50 0.25 -0.50 1.50 1.50 357 5.16 370 77.17 344 4.97 344 78.29 351 4.91 339 78.96 350 5.05 349 79.30<br />
0.50 0.50 -0.50 2.00 2.00 438 4.93 447 77.17 418 4.93 418 78.29 425 4.96 416 79.17 421 4.99 422 79.53<br />
0.50 0.50 -0.50 0.50 0.50 237 4.87 245 76.75 226 4.95 226 78.17 233 5.12 225 78.77 231 5.06 231 79.44<br />
0.50 0.50 -0.50 2.00 0.50 340 4.99 348 77.40 324 5.07 324 78.53 329 5.02 321 79.09 327 5.06 327 79.52