29.11.2014 Views

corresponding pdf

corresponding pdf

corresponding pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

SAT scores — knitr<br />

STT 3820<br />

Alan T. Arnholt<br />

March 18, 2012<br />

1 SAT scores<br />

How strong was the association between student scores on the Math and Verbal sections of the old SAT?<br />

Scores on each ranged from 200 to 800 and were widely used by college admission offices.<br />

a) Is there evidence of an association between Math and Verbal scores? Write an appropriate hypothesis.<br />

b) Discuss the assumptions for inference.<br />

c) Test your hypothesis and state an appropriate conclusion.<br />

d) Find a 90% confidence interval for the slope of the true line describing the association between Math and<br />

Verbal scores.<br />

e) Explain, in this context, what your confidence interval means.<br />

f) Find a 90% confidence interval for the mean SAT-Math score for all students with an SAT-Verbal score of<br />

500.<br />

g) Find a 90% prediction interval for the Math score of the senior class president if you know that she scored<br />

710 on the Verbal section.<br />

2 Answers<br />

(a) The hypotheses to test for a linear relationship between SAT Verbal and SAT Math scores are: H 0 : β 1 = 0<br />

versus H A : β 1 ≠ 0.<br />

(b) Checking Assumptions<br />

I. Straight Enough Condition: Based on the left and center graphs in Figure 1, there is no obvious<br />

bend in the scatterplot, nor is there is any discernible pattern in the residuals versus the fitted values.<br />

II. Independence Assumption: The data were not collected over time, and there is no reason to think<br />

scores of one student will influence the scores of another student.<br />

III. Does the Plot Thicken Condition: Neither the scatterplot nor the residual plot in Figure 1 show<br />

any substantial changes in the spread about the line.<br />

IV. Nearly Normal Condition, Outlier Condition: Based on the Quantile-Quantile plot in Figure 1<br />

(the right most graph), a normal model for the errors is reasonable.<br />

1


(c) The t-value for testing H 0 : β 1 = 0 versus H A : β 1 ≠ 0 is 11.8802, which has a <strong>corresponding</strong> p-value of<br />

9.7478 × 10 −24 suggesting there is strong evidence of a positive linear relationship between Sat Verbal and<br />

Sat Math scores.<br />

(d) The 90% confidence interval for the slope of the true line is (0.5811, 0.7691).<br />

(e) Based on the sample, we are 90% confident that average SAT Math scores increase between 0.5811 and<br />

0.7691 points for each additional point scored on the SAT Verbal test.<br />

(f) We are 90% confident that the mean Math score for students with a Verbal score of 500 is between 534<br />

and 560.<br />

(g) We are 90% confident that the Math score for the senior class president with a Verbal score of 710 will be<br />

between 569 and 808. Since the highest score one can recieve on the test in any one portion is an 800, we are<br />

90% confident the senior class president will have math score between 569 and 800.<br />

3 Mechanics<br />

> site SAT str(SAT)<br />

’data.frame’: 162 obs. of 2 variables:<br />

$ Math : int 450 540 570 400 590 610 610 570 720 640 ...<br />

$ Verbal: int 450 640 590 400 600 610 630 660 660 590 ...<br />

> modPROB18 summary(modPROB18)<br />

Call:<br />

lm(formula = Math ~ Verbal, data = SAT)<br />

Residuals:<br />

Min 1Q Median 3Q Max<br />

-173.59 -47.60 1.16 45.09 259.66<br />

Coefficients:<br />

Estimate Std. Error t value Pr(>|t|)<br />

(Intercept) 209.5542 34.3494 6.1 7.7e-09 ***<br />

Verbal 0.6751 0.0568 11.9 < 2e-16 ***<br />

---<br />

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1<br />

Residual standard error: 71.8 on 160 degrees of freedom<br />

Multiple R-squared: 0.469,Adjusted R-squared: 0.465<br />

F-statistic: 141 on 1 and 160 DF, p-value: <br />

+ MODprob18 ggplot(data = SAT, aes(x = Verbal, y = Math)) + geom_point() +<br />

+ geom_smooth(method = "lm")<br />

2


ggplot(data = MODprob18, aes(x = .fitted, y = .resid)) +<br />

+ geom_point() + geom_smooth() + geom_abline(intercept = 0, slope = 0,<br />

+ color = "red") + scale_y_continuous(name = "Residuals") + scale_x_continuous(name =<br />

"Fitted Values")<br />

> ggplot(data = MODprob18, aes(sample = .resid)) + stat_qq()<br />

800<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

Math<br />

700<br />

600<br />

500<br />

400<br />

●<br />

● ● ● ●<br />

●<br />

● ● ● ●●<br />

●<br />

● ● ● ●<br />

● ●<br />

●<br />

● ●<br />

●●<br />

●<br />

●<br />

● ●<br />

● ●●●<br />

● ● ●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

● ● ●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●●<br />

● ● ●<br />

● ● ● ●<br />

●●<br />

●<br />

● ● ●●<br />

●<br />

● ●●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ● ● ● ● ● ●<br />

● ● ● ● ●<br />

● ●<br />

●<br />

● ● ●●<br />

● ●<br />

●●<br />

●●<br />

●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●●<br />

●<br />

●<br />

●<br />

Residuals<br />

200<br />

100<br />

0<br />

−100<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

● ●<br />

● ●<br />

● ● ●<br />

●<br />

● ●<br />

● ●<br />

● ●<br />

●<br />

● ●<br />

●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

● ●<br />

●● ● ● ●<br />

●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

● ●●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

● ● ●●<br />

● ●<br />

●<br />

● ● ● ●<br />

●<br />

●<br />

●<br />

●●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

● ●●<br />

●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

sample<br />

200<br />

100<br />

0<br />

−100<br />

●<br />

● ●●●●● ● ●●●●●●●●●●● ●●●● ●<br />

● ●● ●<br />

● ●●●●●●●●●●●●●●●●●● ●<br />

400 500 600 700 800<br />

Verbal<br />

450 500 550 600 650 700<br />

Fitted Values<br />

−2 −1 0 1 2<br />

theoretical<br />

Figure 1: Left graph shows scatterplot of Math versus Verbal scores with superimposed least squares line,<br />

middle graph shows the Residuals versus Fitted Values, the right graph shows a normal quantile-quantile plot<br />

of of the residuals<br />

> confint(modPROB18, level = 0.9)<br />

5 % 95 %<br />

(Intercept) 152.7255 266.3829<br />

Verbal 0.5811 0.7691<br />

> confint(modPROB18, "Verbal", level = 0.9)<br />

5 % 95 %<br />

Verbal 0.5811 0.7691<br />

> CI PI CI[2:3]<br />

[1] 534.1 560.1<br />

> PI[2:3]<br />

[1] 569.3 808.4<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!