Homework 2 (Attendance 3) for Statistics 512 Applied Regression ...

Homework 2 (Attendance 3) for Statistics 512Applied Regression AnalysisMaterial Covered: Chapter 3 Neter et al. and KuhnDue: Friday, 19th September, Fall 2003This homework is worth 5% and marked out of 5 points. Homework assignmentsare to be handed in using Vista on the Internet before 4am. Vista will not allowany homework assignment to be handed in late. It is highly recommended that youcomplete the homework, by hand, before logging onto Vista; use Vista simply tosubmit your answers. Submit as many times as you want before the deadline andreceive the highest score of all the submissions. This is an individual homeworkand so each student submits their own homework, although they are encouraged tocooperate with other students.1. Applied Linear Statistical Models(Neter et al.) Questions.Chapter Problem(s) hints3, pages 144–151 3.3 Grade point average data3.6, 3.14 Plastic hardness data3.15, 3.16 Solution concentration data2. Applied Statistics and the SAS Programming Language(Cody and Smith) Questions.no questions

(3.3) grade point average, hw2-3-3-gpa-diagnosBoth SAS and the TI–83 calculator can be used here. Before starting theanalysis using the TI–83, sort the data from smallest to largest, by X.*HOMEWORK 2, 3-3, PAGES 144-151;DATA GPAVSCORE;INPUT GPA SCORE X1 X2;INDEX = 1;IF SCORE < 4.85 THEN LEVENEGROUP = 'A';IF SCORE GE 4.85 THEN LEVENEGROUP = 'B';DATALINES;3.1 5.5 105 2.92.3 4.8 113 2.83 4.7 118 3.11.9 3.9 107 2.42.5 4.5 110 33.7 6.2 125 2.43.4 6 115 3.52.6 5.2 121 3.12.8 4.7 117 3.11.6 4.3 111 2.92 4.9 123 3.22.9 5.4 114 3.32.3 5 120 3.43.2 6.3 132 2.61.8 4.6 122 31.4 4.3 110 2.82 5 119 3.33.8 5.9 109 3.42.2 4.1 116 2.61.5 4.7 108 2.7;*3.3(A) BOXPLOT OF TEST SCORES;PROC BOXPLOT DATA=GPAVSCORE;TITLE '3.3(A) BOXPLOT OF TEST SCORES';PLOT SCORE*INDEX;RUN;*3.3(B) (ALMOST) DOTPLOT OF RESIDUALS;PROC REG DATA=GPAVSCORE OUTEST=EST;MODEL GPA = SCORE;OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;RUN;PROC PRINT DATA=OUTPLOT;TITLE '3.3(B) RESIDUALS';VAR GPA SCORE PRED RESID;RUN;PROC CHART DATA=OUTPLOT;TITLE '3.3(B) (ALMOST) DOTPLOT OF RESIDUALS';VBAR RESID;RUN;*3.3(C) RESIDUALS VS PREDICTED;SYMBOL1 V=STAR C=BLACK;PROC GPLOT DATA=OUTPLOT;TITLE '3.3(C) RESIDUALS VS PREDICTED';PLOT RESID*PRED;RUN;

*3.3(D) NORMAL PROBABILITY PLOT AND CORRELATION FOR RESIDUALS;*RESIDUALS VS EXPECTED RESIDUALS;PROC SORT DATA=OUTPLOT;BY RESID;RUN;DATA OUTPLOT;SET OUTPLOT NOBS=NOBS;QUANTILE = PROBIT( (_N_- (3/8)) / (NOBS + (1/4)) );RUN;DATA OUTPLOT2;IF _N_ = 1 THEN SET EST;SET OUTPLOT;EXPRESIDUAL = _RMSE_*QUANTILE;RUN;PROC GPLOT DATA=OUTPLOT2;TITLE '3.3(D-1) NORMAL PROBABILITY PLOT';PLOT RESID*EXPRESIDUAL;RUN;PROC CORR DATA=OUTPLOT2;TITLE '3.3(D-2) CORRELATION OF NORMAL PROBABILITY PLOT';VAR RESID EXPRESIDUAL;RUN;*3.3(E) (ALMOST) MODIFIED LEVENE TEST;* OF HOMOGENEITY OF VARIANCE OF RESIDUALS;* BASED ON MEAN (LEVENE), RATHER THAN MEDIAN (MODIFIED);PROC GLM DATA=OUTPLOT ALPHA=0.01;TITLE '3.3(E) (ALMOST) LEVENE TEST';TITLE1 'OF HOMOGENEITY OF VARIANCE OF RESIDUALS';CLASS LEVENEGROUP;MODEL RESID = LEVENEGROUP;MEANS LEVENEGROUP / HOVTEST = LEVENE (TYPE=ABS);RUN;*3.3(E-1) RESIDUALS VS X1;PROC REG DATA=GPAVSCORE;MODEL GPA = X1;OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;RUN;*3.3(E-1) RESIDUALS VS X1;SYMBOL1 V=STAR C=BLACK;PROC GPLOT DATA=OUTPLOT;TITLE '3.3(E-1) RESIDUALS VS X1';PLOT RESID*X1;RUN;*3.3(E-2) RESIDUALS VS X2;PROC REG DATA=GPAVSCORE;MODEL GPA = X2;OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;RUN;*3.3(E-2) RESIDUALS VS X2;SYMBOL1 V=STAR C=BLACK;PROC GPLOT DATA=OUTPLOT;TITLE '3.3(E-2) RESIDUALS VS X2';PLOT RESID*X2;RUN;QUIT;(a) The box plot is (choose one)(i) skewed left, with a five–number summary of(3.9, 4.55, 4.85, 5.45, 6.3)(ii) skewed right, with a five–number summary of(3.9, 4.55, 4.85, 5.45, 6.3)(ii) more or less symmetric, with a five–number summary of(3.9, 4.55, 4.85, 5.45, 6.3)(Type X into L 1 , and Y into L 2 ; 2nd STAT PLOT, ON, choose box plot,then ZoomStat.)(b) The “dot plot” (choose one) confirms / contradicts that the X data isskewed right.(2nd STAT PLOT, ON, choose histogram, then ZoomStat; histogram,rather than dot plot is given)

(c) The residual plot of e versus Ŷ indicates (choose one) more or less constant/ badly varying variance.(2nd STAT PLOT, ON, choose scatter plot of L 3 and L 4 .)(d) The normality assumption does appear to be reasonable for two reasons.First, the normal probability plot of the residuals indicates the data isnormal because this plot is (choose one) more or less linear / badlycurved.(use 2nd STAT PLOT, choose sixth graph)Second, the correlation test for the normal probability plot is given by,1. Statement.The statement of the test is (check none, one or more):(i) H 0 : ρ = 1 versus H 1 : ρ > 1.(ii) H 0 : ρ = 1 versus H 1 : ρ < 1.(iii) H 0 : ρ = 1 versus H 1 : ρ ≠ 1.Recall that if ρ = 1, this implies that the scatterplot is linear. Also recallif the normal probability plot is linear, this indicates the residualsare normally distributed.2. Test.From SAS, the test statistic is(choose one) 0.786 / 0.911 / 0.989The critical value at α = 0.05 and n = 20, is(circle one) 0.322 / 0.793 / 0.951(use Table B.6 of the Neter et al. text)3. Conclusion.Since the test statistic is smaller / larger than the critical value we(circle one) accept / reject the null hypothesis that ρ = 1.(e) Levene Test 11. Statement.The statement of the test is (check none, one or more):(i) H 0 : error variance constant versus H 1 : ρ > 1.(ii) H 0 : error variance constant versus H 1 : not constant(iii) H 0 : error variance constant versus H 1 : ρ ≠ 1.2. Test.From SAS, the p–value is (choose one) 0.0446 / 0.0911 / 0.0989The level of significance is(circle one) 0.01 / 0.05 / .101 The SAS program only does an unmodified version of the Levene test. Although results from themodified and unmodified may different from one another, the basic procedure of both is essentiallythe same.

(3.6) plastic hardness, hw2-3-6-plastic-diagnos*HOMEWORK 2, 3-6, DATA DIAGNOSTICS;*PLASTIC HARDNESS DATA, PAGES 144-151;DATA HARDVTIME;INPUT HARD TIME;INDEX = 1;IF TIME LE 24 THEN LEVENEGROUP = 'A';IF TIME > 24 THEN LEVENEGROUP = 'B';DATALINES;199 16205 16196 16200 16218 24220 24215 24223 24237 32234 32235 32230 32250 40248 40253 40246 40;*3.6(A) BOXPLOT AND DOTPLOT OF RESIDUALS;PROC REG DATA=HARDVTIME OUTEST=EST;MODEL HARD = TIME;OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;RUN;PROC BOXPLOT DATA=OUTPLOT;TITLE '3.6(A) BOXPLOT OF RESIDUALS';PLOT RESID*INDEX;RUN;*3.6(B) RESIDUALS VS PREDICTED;SYMBOL1 V=STAR C=BLACK;PROC GPLOT DATA=OUTPLOT;TITLE '3.6(B) RESIDUALS VS PREDICTED';PLOT RESID*PRED;RUN;

(a) The box plot is (choose one)*3.6(C) NORMAL PROBABILITY PLOT AND CORRELATION FOR RESIDUALS;*RESIDUALS VS EXPECTED RESIDUALS;PROC SORT DATA=OUTPLOT;BY RESID;RUN;DATA OUTPLOT;SET OUTPLOT NOBS=NOBS;QUANTILE = PROBIT( (_N_- (3/8)) / (NOBS + (1/4)) );RUN;DATA OUTPLOT2;IF _N_ = 1 THEN SET EST;SET OUTPLOT;EXPRESIDUAL = _RMSE_*QUANTILE;RUN;PROC GPLOT DATA=OUTPLOT2;TITLE '3.6(C-1) NORMAL PROBABILITY PLOT';PLOT RESID*EXPRESIDUAL;RUN;PROC CORR DATA=OUTPLOT2;TITLE '3.6(C-2) CORRELATION OF NORMAL PROBABILITY PLOT';VAR RESID EXPRESIDUAL;RUN;*3.6(D) RESIDUALS FOR PLOT OF RESIDUALS VS EXPECTED RESIDUALS;PROC REG DATA=OUTPLOT2;TITLE '3.6(D) RESIDUALS VS EXPECTED RESIDUALS REGRESSION';MODEL RESID = EXPRESIDUAL;OUTPUT OUT=OUTPLOT3 PREDICTED=PRED2 RESIDUAL=RESID2;RUN;PROC SORT DATA=OUTPLOT3;BY RESID;RUN;DATA OUTPLOT4;IF _N_ = 1 THEN SET EST;SET OUTPLOT3;STUDRESIDUAL = RESID/_RMSE_;RUN;PROC PRINT DATA=OUTPLOT4;TITLE '3.6(D) STUDENTIZED RESIDUALS';VAR RESID EXPRESIDUAL STUDRESIDUAL;RUN;*3.6(E) (ALMOST) MODIFIED LEVENE TEST;* OF HOMOGENEITY OF VARIANCE OF RESIDUALS;* BASED ON MEAN (LEVENE), RATHER THAN MEDIAN (MODIFIED);PROC GLM DATA=OUTPLOT ALPHA=0.01;TITLE '3.6(E) (ALMOST) LEVENE TEST';TITLE1 'OF HOMOGENEITY OF VARIANCE OF RESIDUALS';CLASS LEVENEGROUP;MODEL RESID = LEVENEGROUP;MEANS LEVENEGROUP / HOVTEST = LEVENE (TYPE=ABS);RUN;QUIT;(i) skewed left, with a five–number summary of(-5.15, -2.2875, 0.1625, 2.8, 5.575)(ii) skewed right, with a five–number summary of(3.9, 4.55, 4.85, 5.45, 6.3)(iii) more or less symmetric, with a five–number summary of(-5.15, -2.2875, 0.1625, 2.8, 5.575)(Type data into L 1 and L 2 , 2nd STAT CALC LinReg ENTER, defineL 3 = 2.03x + 168.6 (LinReg), define L 4 = L 2 − L 3 (residuals). Then, 2ndSTAT PLOT, ON, choose box plot, the ZoomStat.)

(b) The residual plot of e versus Ŷ indicates (choose one) more or less constant/ badly varying variance.elapsed time, X 16 16 16 · · · 40 40hardness, Y 199 205 196 · · · 253 246predicted, Ŷ 201.15 201.15 201.15 · · · 249.98 249.98residual, e = Ŷ − Y -2.15 3.85 -5.15 · · · 3.025 -3.975(2nd STAT PLOT, ON, choose scatter plot of L 3 and L 4 .)(c) From SAS,ordered residual, e -5.150 -3.975 · · · 3.850 5.575expected residual, E[e] = √ MSE [ z ( )]k−0.375-5.720 -4.145 · · · 4.145 5.720n+0.25First, the normal probability plot of the residuals versus the expectedresiduals is more or less linear and so indicates the residuals are normal/ not normal.Second, the correlation test for the normal probability plot is given by,1. Statement.The statement of the test is (check none, one or more):(i) H 0 : ρ = 1 versus H 1 : ρ > 1.(ii) H 0 : ρ = 1 versus H 1 : ρ < 1.(iii) H 0 : ρ = 1 versus H 1 : ρ ≠ 1.Recall that if ρ = 1, this implies that the scatterplot is linear. Also recallif the normal probability plot is linear, this indicates the residualsare normally distributed.2. Test.From SAS, the test statistic is(choose one) 0.786 / 0.901 / 0.992The critical value at α = 0.05 and n = 20, is(circle one) 0.322 / 0.941 / 0.951(use Table B.6 of the Neter et al. text)3. Conclusion.Since the test statistic is smaller / larger than the critical value we(circle one) accept / reject the null hypothesis that ρ = 1.(d) From the SAS output, a list the semistudentized residuals are given by,ordered residual, e -5.150 -3.975 · · · 3.850 5.575studentized residual, e ∗ = √ eMSE-1.592 -1.229 · · · 1.190 1.724The 25th, 50th and 75th t percentiles and corresponding cumulative frequenciesfrom the e ∗ above are:

t percentiles t(0.25, 14) t(0.50, 14) t(0.75, 14)−0.692 0 0.692approx freq of e ∗ 4/16 7/16 11/16Notice that the t are given at n − 2 = 16 − 2 = 14 degrees of freedom. Alsonotice that, for example, for t(0.25, 14) = −0.692, four (4) of the sixteen(16) (or 25% of the) e ∗ values are at or below −0.692, including -1.592,-1.229, -1.144 and -0.750, and that seven (7) (or a little less than 50%) arebetween t(0.25, 14) and t(0.75, 14). In other words, the (roughly) correctpercentage of residuals are found between the various percentiles to indicatethat the residuals (choose one) are / are not normally distributed.(e) Levene Test1. Statement.The statement of the test is (check none, one or more):(i) H 0 : error variance constant versus H 1 : ρ > 1.(ii) H 0 : error variance constant versus H 1 : not constant(iii) H 0 : error variance constant versus H 1 : ρ ≠ 1.2. Test.From SAS, the p–value is (choose one) 0.446 / 0.71 / 0.414The level of significance is(circle one) 0.01 / 0.05 / .103. Conclusion.Since the p–value is smaller / larger than the level of significance we(circle one) accept / reject the null hypothesis that the error varianceis constant.

(3.14) More Plastic hardness, lack of fit test, hw2-3-14-plas-lackofit(a) lack of fit test*HOMEWORK 2, 3-14, LACK OF FIT;*PLASTIC HARDNESS DATA, PAGES 144-151;DATA HARDVTIME;INPUT HARD TIME;DATALINES;199 16205 16196 16200 16218 24220 24215 24223 24237 32234 32235 32230 32250 40248 40253 40246 40;*3.14(A) LACK OF FIT OF LINEAR REGRESSION;PROC RSREG DATA=HARDVTIME;MODEL HARD = TIME / COVAR=1 LACKFIT;RUN;QUIT;1. Statement.The statement of the test is (check none, one or more):(a) H 0 : µ j = β 0 + β 1 X j versus H 1 : µ j > β 0 + β 1 X j .(b) H 0 : µ j = β 0 + β 1 X j versus H 1 : µ j < β 0 + β 1 X j .(c) H 0 : µ j = β 0 + β 1 X j versus H 1 : µ j ≠ β 0 + β 1 X j .2. Test.The test statistic isF ∗ ==SSE(R) − SSE(F )÷ SSE(F )df R − df F df FSSE − SSPE(n − 2) − (n − c) ÷ SSPEn − c= SSLFc − 2 ÷ SSPEn − c= 17.6752÷ 128.75012=

(circle one) 0.82 / 1.82 / 2.88.The critical value at α = 0.01, with 2 and 8 degrees of freedom, is(circle one) 5.32 / 6.93 / 7.32(Use PRGM INVF ENTER 2 ENTER 12 ENTER 0.99 ENTER)3. Conclusion.Since the test statistic, 0.82, is smaller than the critical value, 6.93, we(circle one) accept / reject the null hypothesis that the regressionfunction is linear, µ j = β 0 + β 1 X j .(b) The formulas used to calculate the lack of fit (choose one) are / are notsimplified if there an equal number of replications at each of the X levels.(c) True / FalseSuppose the lack of fit test indicates that the regression function is notlinear (although, in fact, the test in this case does indicate a linear regressionfunction). This means that regression function could be anything butlinear, including quadratic or logarithmic, say.

(3.15) Solution concentration, lack of fit test, hw2-3-15-soln-lackofit*HOMEWORK 2, 3-15, LACK OF FIT;*SOLUTION CONCENTRATION DATA, PAGES 144-151;DATA CONCVTIME;INPUT CONC TIME;DATALINES;0.07 90.09 90.08 90.16 70.17 70.21 70.49 50.58 50.53 51.22 31.15 31.07 32.84 12.57 13.1 1;*3.15(A) LINEAR REGRESSION;PROC REG DATA=CONCVTIME;TITLE '3.15(A) LINEAR REGRESSION';MODEL CONC = TIME;OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;RUN;*3.15(B),(C) LACK OF FIT OF LINEAR REGRESSION;PROC RSREG DATA=CONCVTIME;TITLE '3.15(B) LINEAR REGRESSION';MODEL CONC = TIME / COVAR=1 LACKFIT;RUN;QUIT;(a) From SAS, the linear regression is given by(i) Ŷ = 1.57533 + 0.32400X(ii) Ŷ = 2.57533 + 0.02400X(iii) Ŷ = 2.57533 + 0.32400X(b) lack of fit test1. Statement.The statement of the test is (check none, one or more):(a) H 0 : µ j = β 0 + β 1 X j versus H 1 : µ j > β 0 + β 1 X j .(b) H 0 : µ j = β 0 + β 1 X j versus H 1 : µ j < β 0 + β 1 X j .(c) H 0 : µ j = β 0 + β 1 X j versus H 1 : µ j ≠ β 0 + β 1 X j .2. Test.The test statistic isF ∗ ==SSE(R) − SSE(F )÷ SSE(F )df R − df F df FSSE − SSPE(n − 2) − (n − c) ÷ SSPEn − c

= SSLFc − 2 ÷ SSPEn − c= 2.7675 ÷ 0.15753 10=(circle one) 13.82 / 45.82 / 58.57.The critical value at α = 0.025, with 3 and 10 degrees of freedom, is(circle one) 4.83 / 6.93 / 7.32(Use PRGM INVF ENTER 3 ENTER 10 ENTER 0.975 ENTER)3. Conclusion.Since the test statistic, 58.5714, is larger than the critical value, 4.83,we (circle one) accept / reject the null hypothesis that the regressionfunction is linear, µ j = β 0 + β 1 X j .(c) The lack of fit test, in this case, indicates that the regression function(choose one) is / is not linear. This means that regression function couldbe anything but linear, including quadratic or logarithmic, say.

(3.16) Solution concentration, transformations, hw2-3-16-soln-transform*HOMEWORK 2, 3-16, TRANSFORMATION;*SOLUTION CONCENTRATION DATA, PAGES 144-151;DATA CONCVTIME;INPUT CONC TIME;CONCLOG10 = LOG10(CONC);DATALINES;0.07 90.09 90.08 90.16 70.17 70.21 70.49 50.58 50.53 51.22 31.15 31.07 32.84 12.57 13.1 1;*3.16(A) SCATTER PLOT;SYMBOL1 V=STAR C=BLACK;PROC GPLOT DATA=CONCVTIME;TITLE '3.16(A) SCATTER PLOT';PLOT CONC*TIME;RUN;*3.16(B) COX-BOX TRANSFORMATIONS;DATA CONCVTIME2;ARRAY OBS1[15] OBS101-OBS115;ARRAY OBS2[15] OBS201-OBS215;ARRAY T1[15] T101-T115;ARRAY T2[15] T201-T215;ARRAY T3[15] T301-T315;ARRAY T4[15] T401-T415;ARRAY T5[15] T501-T515;DO I = 1 TO 15;INPUT OBS1[I] OBS2[I];END;PROD = OBS1[1];DO I = 1 TO 14;PROD = PROD*OBS1[I+1];END;K2 = PROD**(1/15);DO I = 1 TO 15;K1 = 1/(-0.2*K2**(-0.2-1));T1[I] = K1*(OBS1[I]**(-0.2)-1);K1 = 1/(-0.1*K2**(-0.1-1));T2[I] = K1*(OBS1[I]**(-0.1)-1);T3[I] = K2*LOG(OBS1[I]);K1 = 1/(0.1*K2**(0.1-1));T4[I] = K1*(OBS1[I]**(0.1)-1);K1 = 1/(0.2*K2**(0.2-1));T5[I] = K1*(OBS1[I]**(0.2)-1);END;DATALINES;0.07 90.09 90.08 90.16 70.17 70.21 70.49 50.58 50.53 51.22 31.15 31.07 32.84 12.57 13.1 1;

DATA CONCVTIME3;SET CONCVTIME2;ARRAY OBS1[15] OBS101-OBS115;ARRAY OBS2[15] OBS201-OBS215;ARRAY T1[15] T101-T115;ARRAY T2[15] T201-T215;ARRAY T3[15] T301-T315;ARRAY T4[15] T401-T415;ARRAY T5[15] T501-T515;DO I = 1 TO 15;CONCT1 = T1[I];CONCT2 = T2[I];CONCT3 = T3[I];CONCT4 = T4[I];CONCT5 = T5[I];TIME = OBS2[I];OUTPUT;END;KEEP CONCT1 CONCT2 CONCT3 CONCT4 CONCT5 TIME;RUN;PROC PRINT DATA=CONCVTIME3;VAR CONCT1 CONCT2 CONCT3 CONCT4 CONCT5 TIME;RUN;PROC REG DATA=CONCVTIME3;TITLE '3.16(B-1) BOX-COX';TITLE1 'Y TRANSFORM LAMBDA -0.2';MODEL CONCT1 = TIME;RUN;PROC REG DATA=CONCVTIME3;TITLE '3.16(B-2) BOX-COX';TITLE1 'Y TRANSFORM LAMBDA -0.1';MODEL CONCT2 = TIME;RUN;PROC REG DATA=CONCVTIME3;TITLE '3.16(B-3) BOX-COX';TITLE1 'Y TRANSFORM LAMBDA 0';MODEL CONCT3 = TIME;RUN;PROC REG DATA=CONCVTIME3;TITLE '3.16(B-4) BOX-COX';TITLE1 'Y TRANSFORM LAMBDA 0.1';MODEL CONCT4 = TIME;RUN;PROC REG DATA=CONCVTIME3;TITLE '3.16(B-5) BOX-COX';TITLE1 'Y TRANSFORM LAMBDA 0.2';MODEL CONCT5 = TIME;RUN;*3.16(C) LOG-10 TRANSFORMATION;PROC REG DATA=CONCVTIME OUTEST=EST;TITLE '3.16(C) Y TRANSFORM LOG 10';MODEL CONCLOG10 = TIME;OUTPUT OUT=OUTPLOT PREDICTED=PRED RESIDUAL=RESID;RUN;*3.16(D) LOG-10 TRANSFORM DATA AND REG PLOT;SYMBOL1 V=STAR C=BLACK;SYMBOL2 V=DOT C=BLACK I=R;PROC GPLOT DATA=OUTPLOT;TITLE '3.16(D) LOG-10 TRANSFORM';TITLE1 'SCATTER PLOT AND REGRESSION LINE';PLOT CONCLOG10*TIME PRED*TIME/ OVERLAY;RUN;*3.16(E) LOG-10 TRANSFORM;* RESIDUALS AND NORMAL PROBABILITY PLOT;SYMBOL1 V=STAR C=BLACK;PROC GPLOT DATA=OUTPLOT;TITLE '3.16(E-1) Y LOG-10 TRANSFORM';TITLE1 'RESIDUALS VERSUS PREDICTED';PLOT RESID*PRED;RUN;PROC SORT DATA=OUTPLOT;BY RESID;RUN;DATA OUTPLOT;SET OUTPLOT NOBS=NOBS;QUANTILE = PROBIT( (_N_- (3/8)) / (NOBS + (1/4)) );RUN;DATA OUTPLOT2;IF _N_ = 1 THEN SET EST;SET OUTPLOT;EXPRESIDUAL = _RMSE_*QUANTILE;RUN;PROC GPLOT DATA=OUTPLOT2;TITLE '3.16(E-2) Y LOG-10 TRANSFORM';TITLE1 'RESIDUALS VS EXPECTED RESIDUALS';PLOT RESID*EXPRESIDUAL;RUN;QUIT;

(a) The scatter plot suggests which of the following transformations on Y ismost appropriate? Choose one.(i) Y ′ = √ Y(ii) Y ′ = log 10 Y(iii) Y ′ = 1 Y(b) From the SAS output, the various SSE values for given box–cox λ transformations,are given below.λ −0.2 −0.1 0 0.1 0.2SSE 0.1235 0.0651 0.0390 0.440 0.0813In this case, the best box–cox transformation of the data is given by(i) λ = −0.2, corresponding to the Y ′ = Y −0.2 transformation(ii) λ = −0.1, corresponding to the Y ′ = Y −0.1 transformation(iii) λ = 0, corresponding to the Y ′ = ln Y transformation(iv) λ = 0.1, corresponding to the Y ′ = Y 0.1 transformation(v) λ = 0.2, corresponding to the Y ′ = Y 0.2 transformationwhich, by the way, is not the Y ′ = log 10 Y transformation suggested above.(c) From SAS, the estimated linear regression is given by (choose one)(i) Ŷ ′ = 0.65488 − 0.19540X(ii) Ŷ ′ = 1.65488 − 0.19540X(iii) Ŷ ′ = 0.65488 + 0.19540X(d) Looking at the SAS output, it appears the estimated regression line Ŷ ′ =0.65488−0.19540X fits the transformed data for Y ′ = log 10 Y (choose one)poorly / very well.(e) True / FalseThe residual plot shows fairly constant variability and the normal probabilityplot shows normality since it is fairly linear.(f) Sincethen (choose one)(i) Y = 4.51731(0.73767) X(ii) Y = 4.51731(0.63767) X(iii) Y = 5.51731(0.63767) XŶ ′ = log 10 Y = 0.65488 − 0.19540X,

Homework 2 (Attendance 3) for Statistics 512 Applied Regression ...

Create successful ePaper yourself

Delete template?

Save as template?