corresponding pdf
corresponding pdf
corresponding pdf
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
SAT scores — knitr<br />
STT 3820<br />
Alan T. Arnholt<br />
March 18, 2012<br />
1 SAT scores<br />
How strong was the association between student scores on the Math and Verbal sections of the old SAT?<br />
Scores on each ranged from 200 to 800 and were widely used by college admission offices.<br />
a) Is there evidence of an association between Math and Verbal scores? Write an appropriate hypothesis.<br />
b) Discuss the assumptions for inference.<br />
c) Test your hypothesis and state an appropriate conclusion.<br />
d) Find a 90% confidence interval for the slope of the true line describing the association between Math and<br />
Verbal scores.<br />
e) Explain, in this context, what your confidence interval means.<br />
f) Find a 90% confidence interval for the mean SAT-Math score for all students with an SAT-Verbal score of<br />
500.<br />
g) Find a 90% prediction interval for the Math score of the senior class president if you know that she scored<br />
710 on the Verbal section.<br />
2 Answers<br />
(a) The hypotheses to test for a linear relationship between SAT Verbal and SAT Math scores are: H 0 : β 1 = 0<br />
versus H A : β 1 ≠ 0.<br />
(b) Checking Assumptions<br />
I. Straight Enough Condition: Based on the left and center graphs in Figure 1, there is no obvious<br />
bend in the scatterplot, nor is there is any discernible pattern in the residuals versus the fitted values.<br />
II. Independence Assumption: The data were not collected over time, and there is no reason to think<br />
scores of one student will influence the scores of another student.<br />
III. Does the Plot Thicken Condition: Neither the scatterplot nor the residual plot in Figure 1 show<br />
any substantial changes in the spread about the line.<br />
IV. Nearly Normal Condition, Outlier Condition: Based on the Quantile-Quantile plot in Figure 1<br />
(the right most graph), a normal model for the errors is reasonable.<br />
1
(c) The t-value for testing H 0 : β 1 = 0 versus H A : β 1 ≠ 0 is 11.8802, which has a <strong>corresponding</strong> p-value of<br />
9.7478 × 10 −24 suggesting there is strong evidence of a positive linear relationship between Sat Verbal and<br />
Sat Math scores.<br />
(d) The 90% confidence interval for the slope of the true line is (0.5811, 0.7691).<br />
(e) Based on the sample, we are 90% confident that average SAT Math scores increase between 0.5811 and<br />
0.7691 points for each additional point scored on the SAT Verbal test.<br />
(f) We are 90% confident that the mean Math score for students with a Verbal score of 500 is between 534<br />
and 560.<br />
(g) We are 90% confident that the Math score for the senior class president with a Verbal score of 710 will be<br />
between 569 and 808. Since the highest score one can recieve on the test in any one portion is an 800, we are<br />
90% confident the senior class president will have math score between 569 and 800.<br />
3 Mechanics<br />
> site SAT str(SAT)<br />
’data.frame’: 162 obs. of 2 variables:<br />
$ Math : int 450 540 570 400 590 610 610 570 720 640 ...<br />
$ Verbal: int 450 640 590 400 600 610 630 660 660 590 ...<br />
> modPROB18 summary(modPROB18)<br />
Call:<br />
lm(formula = Math ~ Verbal, data = SAT)<br />
Residuals:<br />
Min 1Q Median 3Q Max<br />
-173.59 -47.60 1.16 45.09 259.66<br />
Coefficients:<br />
Estimate Std. Error t value Pr(>|t|)<br />
(Intercept) 209.5542 34.3494 6.1 7.7e-09 ***<br />
Verbal 0.6751 0.0568 11.9 < 2e-16 ***<br />
---<br />
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1<br />
Residual standard error: 71.8 on 160 degrees of freedom<br />
Multiple R-squared: 0.469,Adjusted R-squared: 0.465<br />
F-statistic: 141 on 1 and 160 DF, p-value: <br />
+ MODprob18 ggplot(data = SAT, aes(x = Verbal, y = Math)) + geom_point() +<br />
+ geom_smooth(method = "lm")<br />
2
ggplot(data = MODprob18, aes(x = .fitted, y = .resid)) +<br />
+ geom_point() + geom_smooth() + geom_abline(intercept = 0, slope = 0,<br />
+ color = "red") + scale_y_continuous(name = "Residuals") + scale_x_continuous(name =<br />
"Fitted Values")<br />
> ggplot(data = MODprob18, aes(sample = .resid)) + stat_qq()<br />
800<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
Math<br />
700<br />
600<br />
500<br />
400<br />
●<br />
● ● ● ●<br />
●<br />
● ● ● ●●<br />
●<br />
● ● ● ●<br />
● ●<br />
●<br />
● ●<br />
●●<br />
●<br />
●<br />
● ●<br />
● ●●●<br />
● ● ●<br />
●<br />
●<br />
●●<br />
●<br />
●<br />
●<br />
● ●<br />
● ●<br />
●<br />
●<br />
● ● ●<br />
●<br />
● ● ●<br />
●<br />
●<br />
●●<br />
● ● ●<br />
● ● ● ●<br />
●●<br />
●<br />
● ● ●●<br />
●<br />
● ●●<br />
● ● ● ●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
● ● ● ● ● ● ● ●<br />
● ● ● ● ●<br />
● ●<br />
●<br />
● ● ●●<br />
● ●<br />
●●<br />
●●<br />
●<br />
● ●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
● ● ●<br />
●●<br />
●<br />
●<br />
●<br />
Residuals<br />
200<br />
100<br />
0<br />
−100<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ● ●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●●<br />
● ●<br />
● ●<br />
● ● ●<br />
●<br />
● ●<br />
● ●<br />
● ●<br />
●<br />
● ●<br />
●<br />
●<br />
● ● ● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●●<br />
●<br />
●<br />
●<br />
● ●<br />
●● ● ● ●<br />
●●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
●●<br />
●<br />
● ●●<br />
●<br />
●<br />
● ●<br />
●<br />
● ● ●<br />
●<br />
● ● ●●<br />
● ●<br />
●<br />
● ● ● ●<br />
●<br />
●<br />
●<br />
●●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
● ● ●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
● ●●<br />
●<br />
●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
● ●<br />
● ●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
●<br />
sample<br />
200<br />
100<br />
0<br />
−100<br />
●<br />
● ●●●●● ● ●●●●●●●●●●● ●●●● ●<br />
● ●● ●<br />
● ●●●●●●●●●●●●●●●●●● ●<br />
400 500 600 700 800<br />
Verbal<br />
450 500 550 600 650 700<br />
Fitted Values<br />
−2 −1 0 1 2<br />
theoretical<br />
Figure 1: Left graph shows scatterplot of Math versus Verbal scores with superimposed least squares line,<br />
middle graph shows the Residuals versus Fitted Values, the right graph shows a normal quantile-quantile plot<br />
of of the residuals<br />
> confint(modPROB18, level = 0.9)<br />
5 % 95 %<br />
(Intercept) 152.7255 266.3829<br />
Verbal 0.5811 0.7691<br />
> confint(modPROB18, "Verbal", level = 0.9)<br />
5 % 95 %<br />
Verbal 0.5811 0.7691<br />
> CI PI CI[2:3]<br />
[1] 534.1 560.1<br />
> PI[2:3]<br />
[1] 569.3 808.4<br />
3