23.05.2015 Views

SELF-TEST: SIMPLE REGRESSION - MFC home page

SELF-TEST: SIMPLE REGRESSION - MFC home page

SELF-TEST: SIMPLE REGRESSION - MFC home page

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ECO 22000<br />

McRAE<br />

<strong>SELF</strong>-<strong>TEST</strong>: <strong>SIMPLE</strong> <strong>REGRESSION</strong><br />

Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you<br />

should be able to describe the procedures used to get an answer and be able to interpret the answers.<br />

1. What are the assumptions involved in simple linear regression?<br />

2. What line does the method of least squares actually find?<br />

3. What information might we get from a scatter plot of y against x?<br />

4. Describe how to use Excel to create a scatter diagram.<br />

5. Describe how to use Excel to calculate a regression line.<br />

6. The regression equation of starting salary on GPA for a sample of recent graduates of RCCC is salary = 8000 +<br />

3500 * GPA. Randy just graduated with a GPA of 2.6; what starting salary would the regression equation<br />

predict for him?<br />

7. For a cross-section of companies, a marketing analyst regressed sales on advertising expenditures, resulting in<br />

the following Excel output:<br />

SUMMARY OUTPUT<br />

Regression Statistics<br />

Multiple R 0.9<br />

R Square 0.81<br />

Adjusted R Square 0.80<br />

Standard Error 100<br />

Observations 45<br />

ANOVA<br />

df SS MS F Significance F<br />

Regression 1 48000 48000 16 0.000245081<br />

Residual 43 129000 3000<br />

Total 44 173000<br />

Coefficients Standard Error t Stat P-value<br />

Intercept 400 75 5.33 3.41301E -06<br />

Advertising 0.5 0.125 4 0.000245081<br />

a) Write out the regression equation, showing sales as a function of advertising expenditures.<br />

b) Give a point prediction for sales for a company whose advertising expenditures equal $7,000.<br />

c) Give a 95% confidence interval for the average sales level for a company spending $2,000 on<br />

advertising. Assume the mean advertising expenditure = $4,000.<br />

d) Give a 95% confidence interval for a specific value of y for a company spending $2,000 on<br />

advertising with x¯ = $4,000.<br />

e) Explain why the intervals in c. and d. are not the same.


Stats II, Regression, <strong>page</strong> 2<br />

8. What shape do confidence intervals for y values at given x values have? What does this imply about predicted<br />

values far from the mean value of x?<br />

9. Say whether the following statement is true or false and explain your answer: If a regression equation has a high<br />

r 2 , statisticians see no problem with making extrapolations well beyond the observed range of x and y values.<br />

10. What does the coefficient of correlation measure? How is it related to a regression line?<br />

11. Find the coefficient of correlation between x and y:<br />

x y<br />

2 5<br />

1 7<br />

6 3<br />

12. To test whether a correlation between x and y is significant, we should test the null hypothesis _________ with<br />

alternative hypothesis____________; the test statistic is a __________ with ______ d.f.<br />

13. Describe three different ways to find the correlation coefficient using Excel.<br />

14. Comment on the following: Among the industrial nations, there is a negative correlation between average<br />

medical expenditures and life expectancy; this proves that medical care causes people to live shorter lives.<br />

15. r 2 is called the _______ ; it is interpreted as giving the ____ __ _____ in y which is ________ by variation in x.<br />

16. Generally speaking, what does r-squared tell us about a regression equation?<br />

17. ART's engineers regressed production costs on output and found the regression equation: cost = 4000 + 2 *<br />

output. In the regression results, s y.x = 1800 and s b = 0.6; the regression was based on a sample of 40 days’<br />

output and costs. Give a 98% confidence interval for β 1 .<br />

18. Using the data of the preceding question, formulate and conduct an appropriate test for the significance of the<br />

regression coefficient.<br />

19. The following Excel output was generated by regressing percentage rates of inflation on percentage rates of<br />

increase in the money supply:<br />

SUMMARY OUTPUT<br />

Regression Statistics<br />

Multiple R 0.7<br />

R Square 0.49<br />

Adjusted R Square 0.46<br />

Standard Error 1<br />

Observations 62<br />

ANOVA<br />

df SS MS F Significance F<br />

Regression 1 900<br />

Residual 60 6000<br />

Total 61 6900<br />

Coefficients Standard Error t Stat P-value<br />

Intercept -1 0.2 -5 0.00032<br />

X Variable 1 1.2 0.4


Stats II, Regression, <strong>page</strong> 3<br />

a) What is the simple correlation coefficient between prices and money?<br />

b) In a t test of H 0 : ρ = 0, what is the calculated value of t?<br />

c) In a t test of H 0 : β 1 = 0, what is the calculated value of t? At α = 0.01, what should we do with the<br />

null hypothesis?<br />

d) In an ANOVA test of this regression equation, what is the critical value of F for α = 0.025? (Use<br />

FINV to find the critical value.)<br />

e) What is the calculated value of F in an ANOVA test? Should we accept or reject the null hypothesis<br />

of no linear relation between money and inflation?<br />

20. In a regression ANOVA table, how are the following terms defined? Regression sum of squares; residual sum of<br />

squares; total sum of squares. What does each represent?<br />

21. In a regression of managers' salaries on firm size, researchers estimated the equation salary = 20000 + 5000 *<br />

sales, where sales were measured in millions of dollars. Observation number 42 works at a firm with annual<br />

sales of 8 million dollars, and he makes $53,000 a year. What is the residual for observation 42?<br />

22. How could a graph of the residuals from a regression equation help in determining whether ε is normally<br />

distributed?<br />

23. How might you use a histogram of the residuals from a regression equation?<br />

A CPA has gathered the following data for a sample of twelve corporations:<br />

Observation # Long-Term Assets Long-Term Debt<br />

1 54 28<br />

2 47 26<br />

3 60 39<br />

4 56 43<br />

5 64 24<br />

6 26 16<br />

7 47 30<br />

8 69 38<br />

9 62 43<br />

10 45 24<br />

11 48 36<br />

12 39 20<br />

24. (N) Suppose that we wish to know whether acquiring long-term assets is done primarily by acquiring long-term<br />

debt.<br />

a) Designating assets as y and debt as x, use your spreadsheet to find the regression equation of assets on debt;<br />

state this equation in algebraic notation.<br />

b) What does the x coefficient tell you about the relation between assets and debt?<br />

c) What is the correlation between assets and debt? Use a t test to find whether we can consider this significant.<br />

d) Use an appropriate t test to test whether the slope of the regression line can be considered different from 0;<br />

set your significance level at 5%.<br />

e) At 1% significance, use ANOVA to test H 0 : there is no significant linear relation between assets and debt.<br />

f) Make a point prediction of assets for a corporation which has 25 million dollars of long term debt.<br />

g) Give a 95% prediction interval for the assets of a corporation with 25 million dollars of debt.<br />

h) Give a 95% confidence interval for the average of all corporations that have 25 million dollars of debt.<br />

i) Compute and interpret the residual for observation #9.<br />

j) Give a 90% confidence interval for the value of β.<br />

25. What would you look for in a residual plot that would be a clue to the presence of each of the following<br />

conditions?<br />

a) non-normality of the residuals<br />

b) heteroscedasticity<br />

c) non-linearity of the relation between x and y


Stats II, Regression, <strong>page</strong> 4<br />

d) autocorrelation<br />

26. In the ANOVA table, the regression sum of squares is defined as SSR = Σ( ŷ −⎺y) 2 ; explain why that represents<br />

the variation in y which is “explained” by variation in x.<br />

27. The residual sum of squares, or error sum of squares, is defined as SSE = Σ(y − ŷ ) 2 ; explain why this term<br />

represents the variation in y which is NOT “explained” by variation in x.<br />

2<br />

28. r 2 2 ∑(<br />

y − yˆ)<br />

is defined as r = 1−<br />

. Explain how this definition leads to the interpretation usually given of<br />

2<br />

∑(<br />

y − y)<br />

r 2 .<br />

29. What condition is indicated by each of the following residual plots?<br />

A. B.<br />

C. D.


<strong>SELF</strong> <strong>TEST</strong>: MULTIPLE <strong>REGRESSION</strong><br />

1. Marketing researchers at ART, Inc., have regressed their sales on Gross Domestic Product and their own<br />

advertising expenditures with the following result:<br />

Sales = 400,000 + 4,000 × GDP + 7000 × A<br />

a) What could we predict ART's sales to be if GDP = 6.5 trillion and advertising expenditures = 20<br />

million?<br />

b) If GDP rose to 6.8 trillion, by how much would we expect sales to change?<br />

c) ART wishes to increase its unit sales by 21,000; by how much will they need to increase their<br />

advertising budget?<br />

2. Why is the use of adjusted R 2 preferred to the use of plain R 2 in multiple regression? What is it we're adjusting<br />

for?<br />

3. When is it important to use adjusted R 2 ? When is it not important?<br />

4. R 2 can be thought of as the proportion of ____________ in y which is ____________ by _____________ in the<br />

x's. State the definition of R 2 and explain why that definition leads to this interpretation.<br />

5. In performing a t test on a coefficient from multiple regression, what null and alternative hypotheses are we<br />

testing?<br />

The following Excel output is for questions 6 to 12:<br />

SUMMARY OUTPUT<br />

Regression Statistics<br />

Multiple R 0.774597<br />

R Square 0.6<br />

Adjusted R Square 0.52<br />

Standard Error 10.00<br />

Observations 16<br />

ANOVA<br />

df SS MS F Significance F<br />

Regression 3 1800<br />

Residual 12 1200<br />

Total 15 3000<br />

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%<br />

Intercept 20.00 10 2 0.06866<br />

X Variable 1 10.00 5<br />

X Variable 2 5.00 1.5<br />

X Variable 3 3.00 0.5<br />

6. What is the regression equation?<br />

7. How many degrees of freedom are there in the t Stats?<br />

8. According to the t ratios, which of the regression coefficients would be significant at the 5% level? Which at the<br />

10% level?<br />

9. What is the F ratio? What null hypothesis would be tested with this value? At α = 0.01, can we reject the null<br />

hypothesis? Can we reject at α = 0.05?


Stats II, Regression, <strong>page</strong> 6<br />

10. Suppose x 1 = 6, x 2 = 0, and x 3 = 2; what is ŷ ?<br />

11. Observation #8 had x 1 = 8, x 2 = 2, and x 3 = 4; for that observation, y = 108. What is the residual for this<br />

observation?<br />

12. Find a 95% confidence interval for ß 1 , the coefficient on variable X 1 .<br />

13. What is a dummy variable?<br />

14. A marketing researcher has created a dummy variable for "Owns own <strong>home</strong>." John lives in an apartment; what<br />

value will this dummy have for him? Mary is paying off the mortgage on her condominium; what value will this<br />

dummy have for her?<br />

15. In a regression of monthly entertainment expenditures on several things, the dummy of q. 14 had the value −$21.<br />

Explain the meaning of this number.<br />

16. What is multicollinearity? How can we detect it?<br />

17. What are the effects in regression analyses of multicollinearity?<br />

18. Suppose the relation between x and y is not linear: how could you detect this nonlinearity?<br />

19. (N) A researcher wishes to be able to predict the number of movies attended in a year's time on the basis of four<br />

explanatory variables: age, education, income, and sex. A sample of ten people yields the following data:<br />

No. of Movies Age Education Income Sex Dummy (Male = 1)<br />

25 18 11 35 1<br />

12 35 13 38 0<br />

21 21 14 35 1<br />

9 35 16 50 0<br />

18 25 14 36 0<br />

27 21 13 39 1<br />

4 39 13 37 0<br />

17 31 12 34 0<br />

17 20 14 41 1<br />

7 40 12 29 0<br />

a) Using your spreadsheet, find the regression equation and write it out in algebraic notation.<br />

b) Explain what each of the regression X coefficients means.<br />

c) Using an appropriate t test, at 5% significance test H 0 : β i = 0 for i = 1 to 4.<br />

d) What is the adjusted R 2 ? How would we interpret that number? Why is there so much difference<br />

in this case between R 2 and adjusted R 2 ?<br />

e) Using ANOVA state and test the appropriate null hypothesis to test whether there is a significant<br />

linear relation among these variables.<br />

f) Predict how many movies will be seen by a 37 year-old female high-school graduate whose family<br />

income is $43,000 a year.<br />

g) State the 95% confidence interval for each X coefficient.<br />

h) Calculate a 98% confidence interval for β 2<br />

i) Find the residual for the first observation (25 movies, age 18 and so on).<br />

j) In examining the residual plots generated by the Excel, do you detect any problems or violations of<br />

the regression assumptions?<br />

k) Does there appear to be significant multicollinearity among the X variables? How do you know<br />

that?


Stats II, Regression, <strong>page</strong> 7<br />

Selected Answers:<br />

Simple Regression::<br />

6. 17,100 19. a. 0.7<br />

7. a. sales = 400 + 0.5 × adv b. 7.59<br />

b. 3900 c. 3 ⇒ reject<br />

c. 1400 ± 505.07 d. 5.29<br />

d. 1400 ± 543.84 e. 9, reject<br />

11. −0.945 21. −$7,000<br />

17. 2 ± 1.4574 29. a. nothing in particular<br />

18. H 0 : β = 0; t = 3.33; p-value b. autocorrelation<br />

= 0.0019 c. non-linearity<br />

d. heteroscedasticity<br />

24. a. y-hat = 22.62 + 0.94 X<br />

b. for each one-dollar increase in debt, assets increase 94 cents<br />

c. 0.71; since p value = 0.0092, we can reject at 1% significance<br />

the hypothesis that population correlation = 0.<br />

d. for α = 0.05, critical t = 2.228 < calculated 3.219, so reject<br />

the null that β = 0. (Alternatively, since p < 0.05, reject.)<br />

e. Critical F = 10.04 < 10.359, so reject null and conclude there is<br />

a significant relation. (Alternatively, in ANOVA table p < 0.01, so reject null.)<br />

f. 46.16 g. 46.16 ± 20.71 h. 46.16 ± 6.72<br />

i. −1.107 j. 0.41 ≤ β 1 ≤ 1.47<br />

Multiple Regression:<br />

1. 566,000; +1,200; $3 million 6. ŷ = 20+10x 1 +5x 2 +3x 3 7. 12<br />

8. β 2 and β 3 at 5%; all at 10%<br />

9. F=6; with 3,12 d.f. F .01 =5.95, so reject H O at 1% and 5%<br />

10. 86 11. −14 12. 10 ± 10.89 14. 0; 1<br />

15. <strong>home</strong>owners typically spend $21 a month less on entertainment<br />

19. a. movies = 56.71 −0.93 x age −1.30 x educ + 0.096 x inc − 2.28 x male<br />

b. movies attended falls by .93 for each year age increases, falls<br />

by 1.3 for each extra year of education, and increases by about<br />

0.1 for each extra thousand dollars of family income; other<br />

things being equal males attend 2.28 fewer movies a year than females<br />

c. reject H 0 for β 1 since p = 0.024; fail to reject for i = 2 - 4 since all p values > 0.05<br />

d. Adj. R 2 = 0.77; these four variables explain 77% of the observed<br />

variation in movie attendance.<br />

e. H 0 : β 1 = β 2 = β 3 = β 4 = 0 vs. H 1 : at least one equality not true<br />

F = 8.549 with p value = 0.018, so at 2% significance we reject<br />

null and conclude there is a significant linear relation with at<br />

least one of the x variables.<br />

f. y-hat = 10.82.<br />

g. see output Lower 95% Upper 95%<br />

h. 3.37 ± 4.86<br />

i. since y-hat = 26.72, residual = −1.72<br />

j. no<br />

k. yes; education is highly correlated with income and sex with age; use Data Analysis<br />

Correlation tool

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!