Homework 3 (Attendance 5) for Statistics 512 Applied Regression ...

Homework 3 (Attendance 5) for Statistics 512 

Applied Regression Analysis 

Material Covered: Chapter 6 Neter et al. and Kuhn 

By: Friday, 3rd October, Fall 2003 

This homework is worth 5% and marked out of 5 points. Homework assignments 

are to be handed in using Vista on the Internet before 4am. Vista will not allow 

any homework assignment to be handed in late. It is highly recommended that you 

complete the homework, by hand, before logging onto Vista; use Vista simply to 

submit your answers. Submit as many times as you want before the deadline and 

receive the highest score of all the submissions. This is an individual homework 

and so each student submits their own homework, although they are encouraged to 

cooperate with other students. 

1. Applied Linear Statistical Models 

(Neter et al.) Questions. 

Chapter Problem(s) hints 

6, pages 252–257 6.9, 6.10, 6.11, 6.12, 6.13, 6.14 Chemical shipment 

6.18, 6.19, 6.20, 6.21 Mathematicians’ salaries

(6.9) chemical shipment, hw3-6-9-chem-diagnos 

*HOMEWORK 3, 6-9, PAGES 252-257; 

DATA CHEMICAL; 

INPUT Y X1 X2 TIME; 

DATALINES; 

58 7 5.11 1 

152 18 16.72 2 

41 5 3.2 3 

93 14 7.03 4 

101 11 10.98 5 

38 5 4.04 6 

203 23 22.07 7 

78 9 7.03 8 

117 16 10.62 9 

44 5 4.76 10 

121 17 11.02 11 

112 12 9.51 12 

50 6 3.79 13 

82 12 6.45 14 

48 8 4.6 15 

127 15 13.86 16 

140 17 13.03 17 

155 21 15.21 18 

39 6 3.64 19 

90 11 9.57 20 

; 

*6.9(A) STEM AND LEAF OF X1 AND X2; 

PROC UNIVARIATE DATA=CHEMICAL PLOT; 

TITLE1 '6.9(A) STEM AND LEAF OF NUMBER OF DRUMS, X1'; 

TITLE2 'AND OF WEIGHT OF SHIPMENTS, X2'; 

VAR X1 X2; 

RUN; 

*6.9(B) TIMEPLOTS OF HANDLING MINUTES; 

SYMBOL1 V=STAR C=BLACK; 

PROC GPLOT DATA=CHEMICAL; 

TITLE1 '6.9(B-1) TIMEPLOT OF NUMBER OF DRUMS, X1'; 

PLOT X1*TIME; 

RUN; 



TITLE1 '6.9(B-2) TIMEPLOT OF WEIGHT OF SHIPMENTS, X2'; 

PLOT X2*TIME; 

RUN; 

*6.9(C) SCATTERPLOT MATRICES AND CORRELATION; 



TITLE1 '6.9(C-1) HANDING TIME VERSUS NUMBER OF DRUMS, Y VS X1'; 

PLOT Y*X1; 

RUN; 



TITLE1 '6.9(C-2) HANDING TIME VERSUS NUMBER OF DRUMS, Y VS X2'; 

PLOT Y*X2; 

RUN; 



TITLE1 '6.9(C-3) HANDING TIME VERSUS NUMBER OF DRUMS, X1 VS X2'; 

PLOT X1*X2; 

RUN; 

PROC CORR DATA=CHEMICAL; 

TITLE '6.9(C-4) CORRELATION Y, X1 AND X2'; 

VAR Y X1 X2; 

RUN; 

QUIT; 

(a) Stem–and–leaf plots. 

Look for outliers. 

(b) Time Plots. 

Any patterns

(c) Scatter plots and correlation matrix 

It would be “good” that Y is strongly linearly related to both X 1 and X 2 , 

but it would be “bad” that X 1 and X 2 are strongly linearly related to one 

another.

(6.10) chemical shipment again, hw3-6-10-chem-residual 




X1X2 = X1*X2; 

DATALINES; 

58 7 5.11 1 

152 18 16.72 2 

41 5 3.2 3 

93 14 7.03 4 

101 11 10.98 5 

38 5 4.04 6 

203 23 22.07 7 

78 9 7.03 8 

117 16 10.62 9 

44 5 4.76 10 

121 17 11.02 11 

112 12 9.51 12 

50 6 3.79 13 

82 12 6.45 14 

48 8 4.6 15 

127 15 13.86 16 

140 17 13.03 17 

155 21 15.21 18 

39 6 3.64 19 

90 11 9.57 20 

; 

*6.10(A) REGRESSION; 

PROC REG DATA=CHEMICAL OUTEST=EST; 

TITLE1 '6.10(A) REGRESSION OF Y VS X1 AND X2'; 

MODEL Y = X1 X2; 

OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID; 

RUN; 

*6.10(B) BOXPLOT OF RESIDUALS; 

PROC UNIVARIATE DATA=OUTPLOT PLOT; 

TITLE1 '6.10(B) BOXPLOT OF RESIDUALS'; 

VAR RESID; 

RUN; 

*6.10(C) RESIDUALS VS PREDICTED, X1, X2 AND X1X2; 


PROC GPLOT DATA=OUTPLOT; 

TITLE '6.10(C-1) RESIDUALS VS PREDICTED'; 

PLOT RESID*PRED; 

RUN; 



TITLE '6.10(C-2) RESIDUALS VS X1'; 

PLOT RESID*X1; 

RUN; 



TITLE '6.10(C-3) RESIDUALS VS X2'; 


RUN; 



TITLE '6.10(C-4) RESIDUALS VS X1X2'; 

PLOT RESID*X1X2; 

RUN; 

*6.10(C) NORMAL PROBABILITY PLOT; 

* RESIDUALS VS EXPECTED RESIDUALS; 

PROC SORT DATA=OUTPLOT; 

BY RESID; 

RUN; 

DATA OUTPLOT; 

SET OUTPLOT NOBS=NOBS; 

QUANTILE = PROBIT( (_N_- (3/8)) / (NOBS + (1/4)) ); 

RUN; 

DATA OUTPLOT2; 

IF _N_ = 1 THEN SET EST; 

SET OUTPLOT; 

EXPRESIDUAL = _RMSE_*QUANTILE; 

RUN; 

PROC GPLOT DATA=OUTPLOT2; 

TITLE '6.10(C-5) NORMAL PROBABILITY PLOT'; 

PLOT RESID*EXPRESIDUAL; 

RUN; 

*6.10(D) TIMEPLOT OF RESIDUALS; 



TITLE1 '6.10(D) TIMEPLOT OF RESIDUALS'; 

PLOT RESID*TIME; 

RUN; 

*6.10(E) LEVENE TEST OF RESIDUALS; 

DATA NEWCHEMICAL; 

SET OUTPLOT; 

IF PRED < 92 THEN LEVENEGROUP = 'A'; 

IF PRED GE 92 THEN LEVENEGROUP = 'B'; 

RUN; 

PROC GLM DATA=NEWCHEMICAL ALPHA=0.01; 

TITLE '6.10(E) (UNMODIFIED) LEVENE TEST'; 

TITLE1 'OF HOMOGENEITY OF VARIANCE OF RESIDUALS'; 

CLASS LEVENEGROUP; 

MODEL RESID = LEVENEGROUP; 

MEANS LEVENEGROUP / HOVTEST = LEVENE (TYPE=ABS); 

RUN; 

QUIT;

(a) Estimated regression function. 

(b) Box plot of the residuals. 

Look for outliers. 

(c) Residual plots. 

It is “good” if there is no pattern or outliers in residual plots. 

(d) Residuals versus time plot. 

(e) Levene Test 

1. Statement. 

The statement of the test is (check none, one or more): 

(i) H 0 : error variance constant versus H 1 : ρ > 1. 

(ii) H 0 : error variance constant versus H 1 : not constant 

(iii) H 0 : error variance constant versus H 1 : ρ ≠ 1. 

2. Test. 

From SAS, the p–value is (choose one) 0.446 / 0.8278 / 0.989 

The level of significance is (circle one) 0.01 / 0.05 / .10 

3. Conclusion. 

Since the p–value is smaller / larger than the level of significance we 

(circle one) accept / reject the null hypothesis that the error variance 

is constant.

(6.11) chemical shipment again, hw3-6-11-chem-regress 




DATALINES; 

58 7 5.11 1 

152 18 16.72 2 

41 5 3.2 3 

93 14 7.03 4 

101 11 10.98 5 

38 5 4.04 6 

203 23 22.07 7 

78 9 7.03 8 

117 16 10.62 9 

44 5 4.76 10 

121 17 11.02 11 

112 12 9.51 12 

50 6 3.79 13 

82 12 6.45 14 

48 8 4.6 15 

127 15 13.86 16 

140 17 13.03 17 

155 21 15.21 18 

39 6 3.64 19 

90 11 9.57 20 

; 

*6.11 REGRESSION; 

PROC REG DATA=CHEMICAL; 

TITLE1 '6.11 REGRESSION OF Y VS X1 AND X2'; 


RUN; 

QUIT; 

Source Sum Of Squares Degrees of Freedom Mean Squares 

Regression 40,496.48 p − 1 = 3 − 1 = 2 20,248.24 

Error 536.47 n − p = 20 − 3 = 17 31.56 

Total 41,032.95 n − 1 = 20 − 1 = 19 

(a) Test of regression relation at α = 0.05. 

1. Statement. 


(i) H 0 : β 1 = β 2 = 0 versus H 1 : β 1 = β 2 > 0. 

(ii) H 0 : β 1 = β 2 = 0 versus H 1 : β 1 = β 2 < 0. 

(iii) H 0 : β 1 = β 2 = 0 versus H 1 : not all β i is zero. 

2. Test. 

From SAS, the p–value is (choose one) 0 / 0.0827 / 0.098 




(circle one) accept / reject the null hypothesis that β 1 = β 2 = 0.

(b) Bonferroni Confidence Intervals. 

From TI–83 (INVT 18 ENTER 0.975 ENTER) 

B = t(1 − α/2g; n − 2) = t(1 − 0.05/2(2); 20 − 2) = t(0.9875; 18) = 2.458 

From SAS, 

1. Bonferroni CI for β 1 : 

b 1 = 3.7681 and s{b 1 } = 0.614, 

b 1 ± Bs{b 1 } = 3.7681 ± 2.458(0.614) = 


b 2 = 5.0796 and s{b 2 } = 0.666 

b 2 ± Bs{b 2 } = 5.0796 ± 2.458(0.666) = 

(c) Correlation Coefficient. 

R 2 = SSR = 40,496.48 ≈ 0.987 

SSTO 41,032.95 

R 2 is also given directly on the SAS output

(6.12) chemical shipment again, hw3-6-12-chem-respCI 


DATA CHEMICALX; 


DATALINES; 

58 7 5.11 1 

152 18 16.72 2 

41 5 3.2 3 

93 14 7.03 4 

101 11 10.98 5 

38 5 4.04 6 

203 23 22.07 7 

78 9 7.03 8 

117 16 10.62 9 

44 5 4.76 10 

121 17 11.02 11 

112 12 9.51 12 

50 6 3.79 13 

82 12 6.45 14 

48 8 4.6 15 

127 15 13.86 16 

140 17 13.03 17 

155 21 15.21 18 

39 6 3.64 19 

90 11 9.57 20 

. 5 3.20 21 

. 6 4.80 22 

. 10 7.00 23 

. 14 10.00 24 

. 20 18.00 25 

; 

DATA CHEMICAL X; 

SET CHEMICALX; 

IF READ NE . THEN OUTPUT CHEMICAL; 

ELSE OUTPUT X; 

RUN; 

PROC REG DATA=CHEMICAL ALPHA=0.05 NOPRINT; 

TITLE '6.12(A) BONFERRONI AND WH JOINT CIs FOR MEAN'; 



RUN; 

PROC REG DATA=CHEMICALX; 


OUTPUT OUT=PRED_DS(WHERE=(Y =.)) P=PHAT STDP=STDP; 

RUN; 

PROC PRINT DATA=PRED_DS; 

RUN; 

PROC PLOT DATA=CHEMICALX; 

TITLE '6.12(B) RANGE OF X1 AND X2'; 

PLOT X1*X2=Y; 

RUN; 

PROC G3D DATA=CHEMICALX; 

SCATTER X1*X2=Y; 

RUN; 

QUIT; 

(a) Family CIs For Different Responses. 

At α = 0.05, and g = 5 (five simultaneous intervals), 

from TI–83, √ 

√ 

W = pF (1 − α; p, n − p) = 3F (1 − 0.05; 3, 20 − 3) = 3.098 

(INVF 3 ENTER 17 ENTER 0.95 ENTER, 

then multiply by 3 and find the square root) 

B = t(1 − α/2g; n − p) = t(1 − 0.05/2(5); 20 − 3) = t(0.995; 17) = 2.898 

(INVT 17 ENTER 0.995 ENTER) 

Since W = 3.098 > B = 2.898, use 

From SAS, 

1. X h1 = 5, X h2 = 3.20:

Ŷ h = 38.4195 and s{Ŷh} = 2.0332 

Ŷ h ± Bs{Ŷh} = 38.4195 ± 2.898(2.0332) = 

2. X h1 = 6, X h2 = 4.80: 

Ŷ h = 50.3150 and s{Ŷh} = 1.9192 

Ŷ h ± Bs{Ŷh} = 50.3150 ± 2.898(1.9192) = 

3. X h1 = 10, X h2 = 7.00: 

Ŷ h = 76.5625 and s{Ŷh} = 1.3701 

Ŷ h ± Bs{Ŷh} = 76.5625 ± 2.898(1.3701) = 

4. X h1 = 14, X h2 = 10.00: 

Ŷ h = 106.8737 and s{Ŷh} = 1.4761 

Ŷ h ± Bs{Ŷh} = 106.8737 ± 2.898(1.4761) = 

5. X h1 = 20, X h2 = 18.00: 

Ŷ h = 170.1191 and s{Ŷh} = 2.6096 

Ŷ h ± Bs{Ŷh} = 170.1191 ± 2.898(2.6096) = 

(b) Plot X i1 versus X i2 . 

The point (X 1 , X 2 ) = (20, 5) is clearly where in the the scatter of points 

The point (X 1 , X 2 ) = (20, 19) is clearly where in the the scatter of points

(6.13) chemical shipment again, hw3-6-13-chem-respPI 




DATALINES; 

58 7 5.11 1 

152 18 16.72 2 

41 5 3.2 3 

93 14 7.03 4 

101 11 10.98 5 

38 5 4.04 6 

203 23 22.07 7 

78 9 7.03 8 

117 16 10.62 9 

44 5 4.76 10 

121 17 11.02 11 

112 12 9.51 12 

50 6 3.79 13 

82 12 6.45 14 

48 8 4.6 15 

127 15 13.86 16 

140 17 13.03 17 

155 21 15.21 18 

39 6 3.64 19 

90 11 9.57 20 

; 

PROC IML; 

USE CHEMICAL; 

READ ALL VAR {'X1'} INTO X1; 


READ ALL VAR {'Y'} INTO Y; 

N = NROW(X1); 

M = NCOL(Y); 

J = J(N,N,1); 

X = J(N,1,1)||X1||X2; 

B = INV(X`*X)*X`*Y; 

H = X*INV(X`*X)*X`; 

SSE = Y`*(I(N) - H)*Y; 

DFE = N - 3; 

MSE = SSE/DFE; 

XH = { 1 1 1 1, 

9 12 15 18, 

7.20 9.00 12.50 16.50}; 

YHAT = XH`*B; 

*SQRT WORKS BECAUSE NO NEGATIVES!; 

SPRED = SQRT(MSE*(1 + XH`*INV(X`*X)*XH)); 

PRINT YHAT; 

PRINT S2PRED; 

PRINT SPRED; 

RUN; 

QUIT; 

At α = 0.05, g = 4 (four simultaneous intervals) 

and p = 3 (three parameters, β 0 ,β 1 , β 2 ), 

from√TI–83, 

√ 

S = gF (1 − α; g, n − p) = 4F (1 − 0.05; 4, 20 − 3) = 3.441 



B = t(1 − α/2g; n − p) = t(1 − 0.05/2(4); 20 − 3) = t(0.99375; 17) = 2.793 


Since S = 3.441 > B = 2.793, use B because the Bonferroni gives narrower 

(more efficient) CIs than the Scheffe CIs. 

From SAS,

1. X h1 = 9, X h2 = 7.20: 

Ŷ h = 73.8103 and s{pred} = 5.8076 

Ŷ h ± Bs{Ŷh} = 73.8103 ± 2.793(5.8076) = 

2. X h1 = 12, X h2 = 9.00: 

Ŷ h = 94.2579 and s{pred} = 5.7578 

Ŷ h ± Bs{Ŷh} = 94.2579 ± 2.793(5.7578) = 

3. X h1 = 15, X h2 = 12.50: 

Ŷ h = 123.3408 and s{pred} = 5.8217 

Ŷ h ± Bs{Ŷh} = 123.3408 ± 2.793(5.8217) = 

4. X h1 = 18, X h2 = 16.50: 

Ŷ h = 154.9635 and s{pred} = 6.1013 

Ŷ h ± Bs{Ŷh} = 154.9635 ± 2.793(6.1013) =

(6.14) chemical shipment again, hw3-6-14-chem-respPmean 




DATALINES; 

58 7 5.11 1 

152 18 16.72 2 

41 5 3.2 3 

93 14 7.03 4 

101 11 10.98 5 

38 5 4.04 6 

203 23 22.07 7 

78 9 7.03 8 

117 16 10.62 9 

44 5 4.76 10 

121 17 11.02 11 

112 12 9.51 12 

50 6 3.79 13 

82 12 6.45 14 

48 8 4.6 15 

127 15 13.86 16 

140 17 13.03 17 

155 21 15.21 18 

39 6 3.64 19 

90 11 9.57 20 

; 

PROC IML; 

USE CHEMICAL; 



READ ALL VAR {'Y'} INTO Y; 

N = NROW(X1); 

M = NCOL(Y); 

J = J(N,N,1); 

X = J(N,1,1)||X1||X2; 

B = INV(X`*X)*X`*Y; 

H = X*INV(X`*X)*X`; 

SSE = Y`*(I(N) - H)*Y; 

DFE = N - 3; 

MSE = SSE/DFE; 

XH = { 1 1 1, 

7 7 7, 

6 6 6}; 

YHAT = XH`*B; 

*SQRT WORKS BECAUSE NO NEGATIVES!; 

SPRED = SQRT(MSE*(1/3 + XH`*INV(X`*X)*XH)); 

PRINT YHAT; 

PRINT SPRED; 

RUN; 

QUIT; 

(a) Mean of New Observations CI. 

At α = 0.05, p = 3 (three parameters, β 0 , β 1 , β 2 ), 

and m = 3 (mean of three new observations) 

from TI–83, 

B = t(1 − α/2; n − p) = t(1 − 0.05/2; 20 − 3) = t(0.975; 17) = 2.110 


X h1 = 7, X h2 = 6: 

Ŷ h = 60.1786 and s{predmean} = 3.7281 

Ŷ h ± Bs{Ŷh} = 60.1786 ± 2.110(3.7281) = 

(b) A CI for the total handling time, then, would be 

3 × (52.30, 68.04) =

(6.18) Mathematicians’ salaries, hw3-6-18-math-diagnos 


DATA MATH; 

INPUT Y X1 X2 X3; 

X1X2 = X1*X2; 

X1X3 = X1*X3; 

X2X3 = X2*X3; 

DATALINES; 

33.2 3.5 9 6.1 

40.3 5.3 20 6.4 

38.7 5.1 18 7.4 

46.8 5.8 33 6.7 

41.4 4.2 31 7.5 

37.5 6 13 5.9 

39 6.8 25 6 

40.7 5.5 30 4 

30.1 3.1 5 5.8 

52.9 7.2 47 8.3 

38.2 4.5 25 5 

31.8 4.9 11 6.4 

43.3 8 23 7.6 

44.1 6.5 35 7 

42.8 6.6 39 5 

33.6 3.7 21 4.4 

34.2 6.2 7 5.5 

48 7 40 7 

38 4 35 6 

35.9 4.5 23 3.5 

40.4 5.9 33 4.9 

36.8 5.6 27 4.3 

45.2 4.8 34 8 

35.1 3.9 15 5 

; 

*6.18(A) STEM AND LEAF OF X1, X2 AND X3; 

PROC UNIVARIATE DATA=MATH PLOT; 

TITLE1 '6.18(A) STEM AND LEAF OF WORK QUALITY, X1'; 

TITLE2 'AND OF YEARS OF EXPERIENCE, X2'; 

TITLE3 'AND OF PUBLICATION SUCCESS, X3'; 

VAR X1 X2 X3; 

RUN; 

*6.18(B) SCATTERPLOT MATRICES AND CORRELATION; 


PROC GPLOT DATA=MATH; 

TITLE '6.18(B) SCATTERPLOT MATRICES'; 

PLOT Y*X1; 

PLOT Y*X2; 

PLOT Y*X3; 

PLOT X1*X2; 

PLOT X1*X3; 

PLOT X2*X3; 

RUN; 

PROC CORR DATA=MATH; 

TITLE '6.18(C-4) CORRELATION Y, X1, X2 AND X3'; 

VAR Y X1 X2 X3; 

RUN; 

*6.18(C) REGRESSION; 

PROC REG DATA=MATH OUTEST=EST; 

TITLE1 '6.18(C) REGRESSION OF Y VS X1, X2 AND X3'; 

MODEL Y = X1 X2 X3; 


RUN; 

*6.18(D) BOXPLOT OF RESIDUALS; 

PROC UNIVARIATE DATA=OUTPLOT PLOT; 

TITLE1 '6.18(D) BOXPLOT OF RESIDUALS'; 

VAR RESID; 

RUN; 

*6.18(E) RESIDUALS VS PREDICTED, X1, X2, X3 AND INTERACTIONS; 



TITLE '6.18(E-1) RESIDUALS VS VARIOUS'; 

PLOT RESID*PRED; 







RUN; 

PROC SORT DATA=OUTPLOT; 

BY RESID; 

RUN; 

DATA OUTPLOT; 

SET OUTPLOT NOBS=NOBS; 

QUANTILE = PROBIT( (_N_- (3/8)) / (NOBS + (1/4)) ); 

RUN; 

DATA OUTPLOT2; 

IF _N_ = 1 THEN SET EST; 

SET OUTPLOT; 

EXPRESIDUAL = _RMSE_*QUANTILE; 

RUN; 

PROC GPLOT DATA=OUTPLOT2; 

TITLE '6.18(E-2) NORMAL PROBABILITY PLOT'; 

PLOT RESID*EXPRESIDUAL; 

RUN; 

*6.10(F) LEVENE TEST OF RESIDUALS; 

DATA NEWMATH; 

SET OUTPLOT; 

IF PRED < 38.75 THEN LEVENEGROUP = 'A'; 

IF PRED GE 38.75 THEN LEVENEGROUP = 'B'; 

RUN; 

PROC GLM DATA=NEWMATH ALPHA=0.05; 

TITLE '6.18(F) (UNMODIFIED) LEVENE TEST'; 

TITLE1 'OF HOMOGENEITY OF VARIANCE OF RESIDUALS'; 

CLASS LEVENEGROUP; 

MODEL RESID = LEVENEGROUP; 

MEANS LEVENEGROUP / HOVTEST = LEVENE (TYPE=ABS); 

RUN; 

QUIT;

(a) Stem and Leaf Plots. 

(b) Scatterplots and Correlation Matrix 

(c) Estimated Regression. 

(d) Residual Box Plot. 

(e) Residual Plots. 

(f) Lack of Fit Test. 

(f) Levene Test 

1. Statement. 


(i) H 0 : error variance constant versus H 1 : ρ > 1. 

(ii) H 0 : error variance constant versus H 1 : not constant 

(iii) H 0 : error variance constant versus H 1 : ρ ≠ 1. 

2. Test. 

From SAS, the p–value is (choose one) 0.446 / 0.8278 / 0.884 




(circle one) accept / reject the null hypothesis that the error variance 

is constant.

(6.19) Mathematicians’ salaries continued, hw3-6-19-math-famCI 


DATA MATH; 


X1X2 = X1*X2; 

X1X3 = X1*X3; 

X2X3 = X2*X3; 

DATALINES; 

33.2 3.5 9 6.1 

40.3 5.3 20 6.4 

38.7 5.1 18 7.4 

46.8 5.8 33 6.7 

41.4 4.2 31 7.5 

37.5 6 13 5.9 

39 6.8 25 6 

40.7 5.5 30 4 

30.1 3.1 5 5.8 

52.9 7.2 47 8.3 

38.2 4.5 25 5 

31.8 4.9 11 6.4 

43.3 8 23 7.6 

44.1 6.5 35 7 

42.8 6.6 39 5 

33.6 3.7 21 4.4 

34.2 6.2 7 5.5 

48 7 40 7 

38 4 35 6 

35.9 4.5 23 3.5 

40.4 5.9 33 4.9 

36.8 5.6 27 4.3 

45.2 4.8 34 8 

35.1 3.9 15 5 

; 

*6.19 REGRESSION OF Y ON X1, X2 AND X3; 

PROC REG DATA=MATH OUTEST=EST TABLEOUT ALPHA=0.05; 

TITLE '6.19 REGRESSION'; 

TITLE2 'BONFERRONI JOINT CIs FOR B0, B1 AND B2'; 

TITLE3 'CORRELATION'; 



RUN; 

QUIT; 

(a) Test of regression relation at α = 0.05. 

1. Statement. 


(i) H 0 : β 1 = β 2 = β 3 = 0 versus H 1 : β 1 = β 2 = β 3 > 0. 

(ii) H 0 : β 1 = β 2 = β 3 = 0 versus H 1 : β 1 = β 2 = β 3 < 0. 

(iii) H 0 : β 1 = β 2 = β 3 = 0 versus H 1 : not all β i is zero. 

2. Test. 

From SAS, the p–value is (choose one) 0 / 0.0827 / 0.098 




(circle one) accept / reject the null hypothesis that β 1 = β 2 = β 3 = 0. 

(b) Bonferroni Confidence Intervals. 

From TI–83 (INVT 18 ENTER 0.975 ENTER) 

B = t(1 − α/2g; n − p) = t(1 − 0.05/2(3); 24 − 4) = t(0.9917; 20) = 2.614 

From SAS,


b 1 = 1.1031 and s{b 1 } = 0.330, 

b 1 ± Bs{b 1 } = 1.1031 ± 2.614(0.330) = 


b 2 = 0.3215 and s{b 2 } = 0.037 

b 2 ± Bs{b 2 } = 0.3215 ± 2.614(0.037) = 


b 3 = 1.2889 and s{b 3 } = 0.298 

b 3 ± Bs{b 3 } = 1.2889 ± 2.614(0.298) = 

(c) From SAS,

(6.20) Mathematicians salaries, hw3-6-20-math-respCI 


DATA MATHX; 


DATALINES; 

33.2 3.5 9 6.1 

40.3 5.3 20 6.4 

38.7 5.1 18 7.4 

46.8 5.8 33 6.7 

41.4 4.2 31 7.5 

37.5 6 13 5.9 

39 6.8 25 6 

40.7 5.5 30 4 

30.1 3.1 5 5.8 

52.9 7.2 47 8.3 

38.2 4.5 25 5 

31.8 4.9 11 6.4 

43.3 8 23 7.6 

44.1 6.5 35 7 

42.8 6.6 39 5 

33.6 3.7 21 4.4 

34.2 6.2 7 5.5 

48 7 40 7 

38 4 35 6 

35.9 4.5 23 3.5 

40.4 5.9 33 4.9 

36.8 5.6 27 4.3 

45.2 4.8 34 8 

35.1 3.9 15 5 

. 5.0 20 5 

. 6.0 30 6 

. 4.0 10 4 

. 7.0 50 7 

; 

*6.20 BONFERRONI AND WH JOINT CIs FOR MEAN; 

DATA MATH X; 

SET MATHX; 

IF READ NE . THEN OUTPUT MATH; 

ELSE OUTPUT X; 

RUN; 

PROC REG DATA=MATH ALPHA=0.05 NOPRINT; 

TITLE '6.20 BONFERRONI AND WH JOINT CIs FOR MEAN'; 



RUN; 

PROC REG DATA=MATHX; 


OUTPUT OUT=PRED_DS(WHERE=(Y =.)) P=PHAT STDP=STDP; 

RUN; 

PROC PRINT DATA=PRED_DS; 

RUN; 

QUIT; 

(a) At α = 0.05, and g = 4 (four simultaneous intervals), 

and p = 4 (parameters: β 0 , β 1 , β 2 , β 3 ) 

from TI–83, √ 

√ 

W = pF (1 − α; p, n − p) = 4F (1 − 0.05; 4, 24 − 4) = 3.388 



B = t(1 − α/2g; n − p) = t(1 − 0.05/2(4); 24 − 4) = t(0.99375; 20) = 2.744 


Since W = 3.388 > B = 2.744, use B because the Bonferroni gives narrower 

(more efficient) CIs than the Working–Hotelling CIs. 

From SAS,

1. X h1 = 5, X h2 = 20, X h3 = 5: 

Ŷ h ± Bs{Ŷh} = 36.2377 ± 2.744(0.4631) = 

2. X h1 = 6, X h2 = 30, X h3 = 6: 

Ŷ h ± Bs{Ŷh} = 41.8449 ± 2.744(0.4170) = 

3. X h1 = 4, X h2 = 10, X h3 = 4: 

Ŷ h ± Bs{Ŷh} = 30.6304 ± 2.744(0.7560) = 

4. X h1 = 7, X h2 = 50, X h3 = 7: 

Ŷ h ± Bs{Ŷh} = 50.6674 ± 2.744(0.8975) =

The questions from the text are altered somewhat to fit into the multiple choice 

context given on Vista. The altered questions are given below. 

Problem 6.9, pp 252-257. 

Match the problems with the answers. 

problem 

6.9(a) 

6.9(b) 

6.9(c) 

answer 

time plots indicate wave–like pattern in X i1 and X i2 

time plots indicate fairly random distribution of X i1 and X i2 

scatterplot, correlation indicates strong correlation between Y and X i1 only 

stem and leaf plots indicate X i1 , X i2 both have two extreme outliers 

stem and leaf plots indicate fairly even distribution in X i1 , X i2 

scatterplot, correlation indicates strong correlations between Y , X i1 and X i2 

Problem 6.10, pp 252-257. 


problem answer 

6.10(a) Y = 3.324 + 4.768X i1 + 5.080X i2 

6.10(b) box plot indicates no outlying residuals 

6.10(c) residual plot, normal probability plot indicates no outlying residuals 

6.10(d) residual vs time plot indicates no outlying residuals 

6.10(e) Levene test p-value is 0.8278 

Y = 3.324 + 3.768X i1 + 5.080X i2 

box plot indicates one outlying residual 

residual plot, normal probability plot indicates one outlying residual 

residual vs time plot indicates one outlying residual 

Levene test p-value is 0.989 

Problem 6.11, pp 252-257. 



6.11(a) R 2 = 0.787 

6.11(b) Bonferroni CI for β 1 is (2.259, 5.277) 

6.11(c) R 2 = 0.987 

test of regression relation has F ∗ = 541.58 


Bonferroni CI for β 1 is (3.443, 6.717)

Problem 6.12, pp 252-257. 



6.12(a) for family CIs of response, B = 4.098 > W = 2.898 

6.12(b) point (X h1 , X h2 ) = (20, 5) is inside scatter plot 

for family CIs of response, W = 4.098 > B = 2.898 

for family CIs of response, W = 3.098 > B = 2.898 

point (X h1 , X h2 ) = (20, 5) is outside scatter plot 

point (X h1 , X h2 ) = (20, 19) is outside scatter plot 

Problem 6.13, pp 252-257. 



6.13(a) for Xh1 = 12 and Xh2 = 9.00, CI is (78.176, 100.339) 

6.13(b) for Xh1 = 15 and Xh2 = 12.50, CI is (107.081, 159.600) 

6.13(c) for Xh1 = 15 and Xh2 = 12.50, CI is (107.081, 139.600) 

6.13(d) for Xh1 = 18 and Xh2 = 16.50, CI is (157.923, 172.004) 

for Xh1 = 9 and Xh2 = 7.20, CI is (47.590, 90.031) 




Problem 6.14, pp 252-257. 



6.14(a) for (X h1 , X h2 ) = (7, 6), PI of TOTAL is (166.94, 204.14) 

6.14(b) for (X h1 , X h2 ) = (7, 6), PI of TOTAL is (156.94, 214.14) 

for (X h1 , X h2 ) = (7, 6), CI of MEAN is (42.312, 68.045) 



for (X h1 , X h2 ) = (7, 6), PI of TOTAL is (156.94, 204.14) 

Problem 6.18, pp 252-257. 

Match the problems with the answers.

problem 

answer 

6.18(a) scatterplot, correlation indicates strong correlations between Y , X i1 , X i2 and X i3 

6.18(b) residual box plot indicates badly skewed distribution 

6.18(c) Y = 7.84693 + 0.10313X i1 + 0.32152X i2 + 1.28894X i3 

6.18(d) residual box plot indicates fairly symmetric distribution 

6.18(e) residual plots, normal probability plot indicates data normal 

6.18(f) lack of fit test p–value is 0.567 

6.18(g) Levene test p-value is 0.884 

stem and leaf plots indicates one extreme outlier in X i1 , X i2 , X i3 

stem and leaf plots indicate fairly even distribution in X i1 , X i2 , X i3 

Y = 17.84693 + 1.10313X i1 + 0.32152X i2 + 1.28894X i3 

scatterplot, correlation indicates strong correlations between Y and X i1 , Y and X i2 , Y and X i3 only 

residual plots, normal probability plot indicates data not normal 

unable to do lack of fit test because no repeated observations 

Levene test p-value is 0.584 

Problem 6.19, pp 252-257. 



6.19(a) test of regression relation has F ∗ = 68.119 

6.19(b) Bonferroni CI for β 3 is (0.240, 1.966) 

6.19(c) R 2 = 0.8087 


Bonferroni CI for β 3 is (0.510, 2.068) 

R 2 = 0.9109 

Problem 6.20, pp 252-257. 



6.20(a) for (X h1 , X h2 , X h3 ) = (5, 20, 5), CI is (36.967, 37.508) 

6.20(b) for (X h1 , X h2 , X h3 ) = (6, 30, 6), CI is (40.701, 42.989) 

6.20(c) for (X h1 , X h2 , X h3 ) = (7, 50, 7), CI is (48.205, 55.130) 

6.20(d) for (X h1 , X h2 , X h3 ) = (7, 50, 7), CI is (48.205, 53.130) 

for (X h1 , X h2 , X h3 ) = (5, 20, 5), CI is (34.967, 37.508) 



for (X h1 , X h2 , X h3 ) = (4, 10, 4), CI is (28.556, 32.705)

Homework 3 (Attendance 5) for Statistics 512 Applied Regression ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?