Homework 2 Solutions STA 106, Winter 2012 - Statistics

Homework 2 Solutions 

STA 106, Winter 2012 

1.) a. r = 5 

n.i = 5 

n.t = r * n.i 

T = (1 / sqrt(2)) * qtukey(.95, r, n.t - r) 

S = sqrt((r - 1) * qf(.95, r - 1, n.t - r)) 

g = c(2, 5, 10) 

B = qt(1 - (.05 / (2 * g)), n.t - r) 

T = 2.99, S = 3.39, B(g = 2) = 2.42, B(g = 5) = 2.84, B(g = 10) = 3.15 

Here we see that when there are only two comparisons Bonferonni performs the best, but when the 

number of comparisons is 10, as there would be if we were to conduct every pairwise comparison, then 

Bonferonni performs the worst. From this we can generalize that Bonferonni should be used when 

the number of comparisons of interest is small. 

b. r = 5 

n.i = 20 

#The rest of the code is the same as above 

T = 2.78, S = 3.14, B(g = 2) = 2.28, B(g = 5) = 2.63, B(g = 10) = 2.87 

When we increase the sample size we see that Scheffe is now the worst in all cases. In fact, at 10 

comparisons Bonferroni is nearly as good as Tukey. We can then generalize that Bonferonni performs 

very efficiently for high sample sizes and few comparisons, but if the number of comparisons is large 

Tukey remains the best option. 

2.) All three procedures are of the form “estimator ± statistic ×SE.” The only difference between the 

intervals is the choice of statistic: T, S, or B. The smaller the statistic the more precise the interval, and 

since all three properly adjust for Type I error, and the choice of the statistic does not depend on the 

observed data (i.e. we’re not data snooping), then it is appropriate to choose the smallest interval. 

3.) r = 2 

n.i = 10 

n.t = r * n.i 

g = 1 

#The rest of the code is the same as above 

T = 2.88, S = 2.88, B = 2.88 

In the case where we have two groups that we are comparing (i.e. a two-sample t-test) all three tests 

are equivalent and return nearly the value that would normally be used in a standard two-sample t-test. 

The only difference, as can be seen from the Bonferonni statistic, is that there is one fewer degree of 

freedom. 

4.) a. #READ IN DATA 

rehab = read.table(file = "http://br312.math.tntech.edu/.../CH16PR09.txt") 

#RENAME THE VARIABLES 

names(rehab) = c("time", "status", "ID") 

#LOAD GPLOTS PACKAGE 

library(gplots) 

#MAKE MAIN EFFECTS PLOT 

plotmeans(rehab$time ~ as.factor(rehab$status)) 

1

It is also acceptable if you made a “line plot” here. 

b. #FIT ANOVA MODEL 

model = aov(rehab$time ~ as.factor(rehab$status)) 

#INPUT BASIC INFORMATION FROM THE DATA 

r = 3 

n.i = c(8, 10, 6) 

n.t = sum(n.i) 

#OBTAIN GROUP MEANS 

#I use "unique" so I only end up with one value for each group 

#and I use "round" to avoid issues with computer error 

ybar.i = unique(round(model$fitted.values, 2)) 

#CALCULATE MSE - REMEBER MSE IS JUST THE SUM OF THE RESIDUALS SQUARED 

#DIVIDED BY THE DEGREES OF FREEDOM 

MSE = sum(model$residuals^2)/(n.t - r) 

SE = sqrt(MSE / n.i[2]) 

#CALCULATE THE t-STATISTIC 

t = qt(1 - .01 / 2, n.t - r) 

#COMPUTE CONFIDENCE INTERVAL 

c(ybar.i[2] - t * SE, ybar.i[2] + t * SE) 

c. #FIND POINT ESTIMATES 

D.1 = ybar.i[2] - ybar.i[3] 

D.2 = ybar.i[1] - ybar.i[2] 

#CALCULATE B 

99% CI for group 2: (28.01497, 35.98503) 

2

g = 2 

B = qt(1 - (.05 / (2 * g)), n.t - r) 

#CALCULATE STANDARD ERRORS 

SE.1 = sqrt(MSE * (1/n.i[2] + 1/n.i[3])) 

SE.2 = sqrt(MSE * (1/n.i[1] + 1/n.i[2])) 

#CALCULATE CONFIDENCE INTERVALS 

c(D.1 - B * SE.1, D.1 + B * SE.1) 

c(D.2 - B * SE.2, D.2 + B * SE.2) 

95% Family Confidence Intervals for: 

µ2 − µ3 : (2.452, 13.548) 

µ1 − µ2 : (0.904, 11.096) 

d. T = (1 / sqrt(2)) * qtukey(.95, r, n.t - r) 

T = 2.52 while B = 2.41. Since we are only doing two comparisons the Bonferroni method is more 

efficient in this case. 

e. If the researcher chose to estimate D3 = µ1 − µ3 as well, the Bonferroni statistic would need to be 

recalculated, but the Tukey statistic would remain the same. 

f. NOTE: Until now I have been calculating all pairwise comparisons directly. Here I will use the built in 

function to computer all Tukey intervals. Similarly all Bonferroni intervals and several other methods 

not covered in this course can be calculated with the function pairwise.t.test. 

TukeyHSD(model) 

Tukey multiple comparisons of means 

95% family-wise confidence level 

Fit: aov(formula = rehab$time ~ as.factor(rehab$status)) 

$‘as.factor(rehab$status)‘ 

diff lwr upr p adj 

2-1 -6 -11.32141 -0.6785856 0.0253639 

3-1 -14 -20.05870 -7.9413032 0.0000254 

3-2 -8 -13.79322 -2.2067778 0.0060547 

No intervals contain 0, thus, all the means are significantly different. That is, the recovery time for 

each initial fitness status is significantly different than the other two. 

5.) a. #READ IN DATA 

cash = read.table(file = "http://br312.math.tntech.edu/.../CH16PR10.txt") 

#RENAME VARIABLES 

names(cash) = c("offer", "age", "ID") 

#MAKE PLOT 

plotmeans(cash$offer~as.factor(cash$age)) 

3

. #FIT ANOVA MODEL 

model = aov(cash$offer~as.factor(cash$age)) 


r = 3 

n.i = c(12, 12, 12) 




#CALCULATE MSE 


SE = sqrt(MSE / n.i[1]) 

#CALCULATE THE t-STATISTIC 

t = qt(1 - .01 / 2, n.t - r) 


c(ybar.i[1] - t * SE, ybar.i[1] + t * SE) 

99% CI for the mean offer for young individuals: (20.25496, 22.74504) 

c. #OBTAIN POINT ESTIMATE 

D.hat = ybar.i[3] - ybar.i[1] 

#CALCULATE STANDARD ERROR 

SE = sqrt(MSE * (1 / n.i[3] + 1 / n.i[1])) 

#COMPUTE t-STATISTIC 

t = pt(1 - .01 / 2, n.t - r) 


c(D.hat - t * SE, D.hat + t * SE) 

99% CI for the mean offer for D = µ3 − µ1: (-1.840755, 1.680755) 

We are 99% confident that the true value of the difference in mean offers lies within the given interval. 

Since this interval contains zero, or no difference in the means, there is no significant evidence to 

suggest that younger and older individuals receive different offers. 

d. #OBTAIN POINT ESTIMATE 

L.hat = 2 * ybar.i[2] - ybar.i[1] - ybar.i[3] 


4

SE = sqrt(MSE * (4/n.i[2] + 1/n.i[1] + 1/n.i[3])) 

#COMPUTE TEST STATISTIC 

t = L.hat/SE 

#DETERMINE CRITICAL VALUE 

t.crit = qt(1 - .01/2, n.t - r) 

The method to solving this problem becomes much clearer when we rewrite the null hypothesis in the 

following way: 

H0 : µ2 − µ1 = µ3 − µ2 

⇒H0 : 2µ2 − µ1 − µ3 = 0 

H1 : 2µ2 − µ1 − µ3 = 0 

 

∗ |t | ≤ 2.733 Fail to Reject H0 

Decision Rule = 

|t∗ | > 2.733 Reject H0 

We get t ∗ = 11.275, so we reject H0. There is significant evidence to suggest that the offer for the 

young and old age groups differs from the middle age group. 

6.) (a) Let L = µ1 + µ2 − 2µ3. 

#READ IN DATA 

prod = read.table(file = "http://br312.math.tntech.edu/.../CH16PR07.txt") 

#RENAME VARIABLES 

names(prod) = c("prod", "exp", "ID") 

#FIT ANOVA MODEL 

model = aov(prod$prod~as.factor(prod$exp)) 


r = 3 

n.i = c(9, 12, 6) 




#OBTAIN POINT ESTIMATE 

L.hat = ybar.i[1] + ybar.i[2] - 2*ybar.i[3] 



SE = sqrt(MSE * (1 / n.i[1] + 1 / n.i[2] + 4 / n.i[3])) 


t = qt(1 - .05 / 2, n.t - r) 


c(L.hat - t * SE, L.hat + t * SE) 

The 95% confidence interval for L is given by: (-4.922284, -1.857716). This interval does not 

contain zero, so there is strong evidence to suggest there is a difference in the mean productivity 

improvement between firms with low or moderate research and development expenditures and firms 

with high expenditures. 

(b) Let L = 9 

27 µ1 + 12 

27 µ2 + 6 

27 µ3. 

#OBTAIN POINT ESTIMATE 

L.hat = (3/9)*ybar.i[1] + (4/9)*ybar.i[2] + (2/9)*ybar.i[3] 


5

SE = sqrt(MSE * ((3/9)^2 / n.i[1] + (4/9)^2 / n.i[2] + (2/9)^2/ n.i[3])) 


t = qt(1 - .05 / 2, n.t - r) 


c(L.hat - t * SE, L.hat + t * SE) 

We are 95% confident that the mean overall productivity lies within the interval (7.633330 8.268892). 

(c) #OBTAIN POINT ESTIMATES 

D1.hat = ybar.i[3] - ybar.i[2] 



L4.hat = .5 * (ybar.i[1] + ybar.i[2]) - ybar.i[3] 

#CALCULATE STANDARD ERRORs 

SE1 = sqrt(MSE * (1 / n.i[3] + 1 / n.i[2])) 



SE4 = sqrt(MSE * (.5^2 / n.i[1] + .5^2 / n.i[2] + 1 / n.i[3])) 

#COMPUTE S 

S = sqrt((r - 1) * qf(.90, r - 1, n.t - r)) 

#CALCULATE CONFIDENCE INTERVALS 

c(D1.hat - S * SE1, D1.hat + S * SE1) 



c(L4.hat - S * SE4, L4.hat + S * SE4) 

The 90% family confidence intervals are given by: 

D1 : (0.169, 1.971) D3 : (0.455, 2.045) 

D2 : (1.370, 3.270) L1 : (−2.531, −0.859) 

No intervals contained 0, so all differences are significant. That is there is strong evidence to suggest 

that the mean productiviy improvement for every firm differs from the other two. Furthermore, the 

interval for L1 confirms our results from part a. 

7.) In order to solve this problem we need to make a couple assumptions. First, we assume that the sample 

size for each group, ni, is the same for every group. Denote ni = n, then nT = 3n. Further assume that 

the estimate of MSE from the data is the true value. A Tukey interval has the form, 

ˆD ± T × s( ˆ D) 

Now set the margin of error, the right side of the expression above, equal to 3 and let α = .05. Also, we 

have from the data that MSE = 3.8 We get, 

T × s( ˆ D) =3 

 

1 

√ q(1 − α; r, nT − r) MSE( 

2 1 

+ 

n1 

1 

) =3 

n2 

1 

√ q(.95; 3, 3n − 3) 

2 √ 

2 

3.8 

n =3 

q(.95; 3, 3n − 3) √ 1n =1.539 

q(.95; 3, 3n − 3) √ 1n − 1.539 =0 

6

If the above equation returns a negative number for a given value of n, then that sample size satisfies 

our minimum required precision. Through trying several values we find that n = 6 is the smallest value 

to return a negative value of the function. 

7

Homework 2 Solutions STA 106, Winter 2012 - Statistics

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?