29.06.2013 Views

STA 106: Midterm – Solutions - Statistics

STA 106: Midterm – Solutions - Statistics

STA 106: Midterm – Solutions - Statistics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>STA</strong> <strong>106</strong>: <strong>Midterm</strong> <strong>–</strong> <strong>Solutions</strong><br />

February 17, 2012<br />

One Factor ANOVA<br />

Suppose you have been hired by the International Resource for Iris Studies (IRIS) to conduct an analysis.<br />

They are interested in determining if it is possible to identify a species of iris by its petal length. IRIS sampled<br />

50 plants from three species of iris. Summary statistics of the data set are provided below.<br />

Assume the ANOVA model,<br />

Species (i) ni Mean (cm) Variance<br />

Setosa 50 1.462 0.030<br />

Versicolor 50 4.260 0.221<br />

Verginica 50 5.552 0.305<br />

Yij = µi + ɛij,<br />

where ɛij ∼ N (0, σ 2 ), is appropriate. Furthermore, the variance is calculated by,<br />

a. (5 points) State the factor effects model.<br />

where ɛij ∼ N (0, σ 2 ).<br />

s 2 i = 1<br />

ni − 1<br />

ni <br />

(yij − ¯yi·) 2 .<br />

j=1<br />

Yij = µ·· + τi + ɛij,<br />

b. (6 points) Calculate the estimated factor effects, ˆτi.<br />

µ·· =<br />

3<br />

ni<br />

nT<br />

i=1<br />

c. (14 points) Complete the ANOVA table.<br />

µi· =<br />

1.462 + 4.260 + 5.552<br />

3<br />

τ1 =µ1· − µ·· = 1.462 − 3.758 = −2.296<br />

τ2 =µ2· − µ·· = 4.260 − 3.758 = 0.502<br />

τ3 =µ3· − µ·· = 5.552 − 3.758 = 1.794<br />

= 3.758<br />

df Sum of Squares Mean Squares F ∗<br />

Treatment 2 437.10 218.55 1180.2<br />

Error 147 27.22 0.185<br />

Total 149 464.32<br />

d. (6 points) State the null and the alternative hypothesis of the ANOVA test in both symbols and in the<br />

context of the data.<br />

H0 :µ1 = µ2 = µ3. The mean petal length is the same for all three species of iris.<br />

H1 :µi = µj for some i, j. The mean petal length for at least one species of iris differs.


e. (5 points) Assume the significance level of the test is α = 5%. Find the critical value for the test statistic.<br />

Fcrit(.95; 2, 147) = 3.058<br />

f. (5 points) Based on the ANOVA table and critical value, what conclusion can you draw about the petal<br />

length of the three iris species?<br />

I would reject the null hypothesis. The mean petal length for at least one species of iris differs.<br />

g. (8 points) If you were to conduct pairwise comparisons for the three species of iris, which method, Tukey,<br />

Scheffe, or Bonferroni, would be most appropriate? Explain and justify your answer numerically (You do<br />

not need to calculate the intervals).<br />

T = 1<br />

√ 2 q(.95; 3, 150 − 3) = 3.348<br />

√ 2 = 2.367<br />

S = (3 − 1)F (.95; 3 − 1, 150 − 3) = 2(3.058) = 2.473<br />

B =t(1 − .05<br />

, 150 − 3) = 2.352<br />

2(3)<br />

Here, Bonferroni is best as it has the smallest statistic, so it will yield the most precise confidence intervals.<br />

h. (10 points) Now suppose the researchers at IRIS, prior to seeing the data, suspected that the versicolor<br />

and verginica species would be very similar, but wanted to know if these species differed from setosa.<br />

Propose a contrast to test this and calculate the confidence interval for the contrast (α = .10). What can<br />

you conclude from this interval?<br />

To test if versicolor and verginica are different than setosa, we can use the contrast L = 2 × µsetosa −<br />

µversicolor − µverginica. Any constant times this contrast would work as well.<br />

ˆL = 2ˆµ1 − ˆµ2 − ˆµ3 = −6.888<br />

sˆ <br />

L = MSE c2 <br />

i 0.185<br />

= (4 + 1 + 1) = 0.149<br />

ni 50<br />

t(1 − α/2, nT − r) = t(.95, 147) = 1.655<br />

CI : −6.888 ± 1.655(0.149) = (−7.135, −6.641)<br />

This interval does not include zero. Thus, there is significant evidence to suggest that the mean petal<br />

length of versicolor and verginica differs from that of setosa.<br />

i. (5 points) Judging by the summary statistics given from the data, does it appear that any assumptions<br />

of the ANOVA model are violated? If so, which assumption?<br />

It appears that the assumption of equal variance for all groups may be violoated. The setosa group has<br />

a much smaller variance than the other two.<br />

j. (10 points) If assumptions of the ANOVA model were violated suggest another method that could be<br />

used to test the hypotheses given in part d. List and describe the steps and calculations required to<br />

conduct the test you recommended (do not actually carry out the test).<br />

If the distributional assumptions of the ANOVA model are violated we could use the nonparametric F<br />

test. The test can be done in the following steps:<br />

1. Replace each observation with its corresponding rank.


2. Compute,<br />

3. Compute,<br />

4. Calculate the test statistic,<br />

<br />

ni(<br />

MST R =<br />

¯ Ri· − ¯ R··) 2<br />

.<br />

r − 1<br />

<br />

(Rij −<br />

MSE =<br />

¯ Ri·) 2<br />

nT − r<br />

F ∗ = MST R/MSE<br />

5. Compare the test statistic to its probability distribution under the null hypothesis and draw a conclusion.<br />

If F ∗ ≤ F (1 − α; r − 1, nT − r), conclude H0<br />

If F ∗ > F (1 − α; r − 1, nT − r), conclude H1<br />

Two Factor ANOVA<br />

Consider the two-factor model summarized by factor combination means, µij in the table below.<br />

Factor 2<br />

Factor 1 j = 1 j = 2 j = 3<br />

i = 1 96 112 80<br />

i = 2 116 100 132<br />

a. (5 points) Below is an interaction plot of the data. Describe any interaction or main effects visible in<br />

the plot.


As the lines are not close to parallel there is evidence of a strong interaction effect. It can be seen that<br />

if you average over Factor 1 you will obtain similar values at both levels of Factor 2, but this is not true<br />

for averaging over Factor 1. This indicates there may be a main effect for Factor 1 and no main effect for<br />

Factor 2.<br />

b. (6 points) Calculate the main effects for factor 1, αi, and the main effects for factor 2, βj.<br />

96 + 112 + 80 + 116 + 100 + 132<br />

µ·· =<br />

96 + 112 + 80<br />

α1 = − <strong>106</strong> = −10<br />

3<br />

116 + 100 + 132<br />

α2 = = 10<br />

3<br />

96 + 116<br />

β1 = − <strong>106</strong> = 0<br />

2<br />

112 + 100<br />

β2 = − <strong>106</strong> = 0<br />

β3 =<br />

2<br />

80 + 132<br />

2<br />

6<br />

− <strong>106</strong> = 0<br />

= <strong>106</strong><br />

c. (10 points) Let the sample size for each factor combination be n = 5 and let σ = 3. Compute E{MSE}<br />

and E{MSAB}.<br />

E{MSE} =σ 2 = 3 2 = 9<br />

E{MSAB} =σ 2 <br />

(µij − µi· − µ·j + µ··)<br />

+ n<br />

2<br />

(a − 1)(b − 1)<br />

=9 + 5 <br />

(µij − µi· + (<strong>106</strong> − <strong>106</strong>))<br />

2<br />

2<br />

=9 + 5<br />

2 (02 + 16 2 + (−16) 2 + 0 2 + (−16) 2 + 16 2 )<br />

=9 + 5<br />

2 4(162 ) = 2569<br />

d. (5 points) Is E{MSAB} significantly larger than E{MSE}? What is the significance of this?<br />

Yes, E{MSAB} is much larger than E{MSE}. This indicates that there is a very significant interaction<br />

effect in the model given by the means above.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!