SELF-TEST: SIMPLE REGRESSION - MFC home page
SELF-TEST: SIMPLE REGRESSION - MFC home page
SELF-TEST: SIMPLE REGRESSION - MFC home page
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
ECO 22000<br />
McRAE<br />
<strong>SELF</strong>-<strong>TEST</strong>: <strong>SIMPLE</strong> <strong>REGRESSION</strong><br />
Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you<br />
should be able to describe the procedures used to get an answer and be able to interpret the answers.<br />
1. What are the assumptions involved in simple linear regression?<br />
2. What line does the method of least squares actually find?<br />
3. What information might we get from a scatter plot of y against x?<br />
4. Describe how to use Excel to create a scatter diagram.<br />
5. Describe how to use Excel to calculate a regression line.<br />
6. The regression equation of starting salary on GPA for a sample of recent graduates of RCCC is salary = 8000 +<br />
3500 * GPA. Randy just graduated with a GPA of 2.6; what starting salary would the regression equation<br />
predict for him?<br />
7. For a cross-section of companies, a marketing analyst regressed sales on advertising expenditures, resulting in<br />
the following Excel output:<br />
SUMMARY OUTPUT<br />
Regression Statistics<br />
Multiple R 0.9<br />
R Square 0.81<br />
Adjusted R Square 0.80<br />
Standard Error 100<br />
Observations 45<br />
ANOVA<br />
df SS MS F Significance F<br />
Regression 1 48000 48000 16 0.000245081<br />
Residual 43 129000 3000<br />
Total 44 173000<br />
Coefficients Standard Error t Stat P-value<br />
Intercept 400 75 5.33 3.41301E -06<br />
Advertising 0.5 0.125 4 0.000245081<br />
a) Write out the regression equation, showing sales as a function of advertising expenditures.<br />
b) Give a point prediction for sales for a company whose advertising expenditures equal $7,000.<br />
c) Give a 95% confidence interval for the average sales level for a company spending $2,000 on<br />
advertising. Assume the mean advertising expenditure = $4,000.<br />
d) Give a 95% confidence interval for a specific value of y for a company spending $2,000 on<br />
advertising with x¯ = $4,000.<br />
e) Explain why the intervals in c. and d. are not the same.
Stats II, Regression, <strong>page</strong> 2<br />
8. What shape do confidence intervals for y values at given x values have? What does this imply about predicted<br />
values far from the mean value of x?<br />
9. Say whether the following statement is true or false and explain your answer: If a regression equation has a high<br />
r 2 , statisticians see no problem with making extrapolations well beyond the observed range of x and y values.<br />
10. What does the coefficient of correlation measure? How is it related to a regression line?<br />
11. Find the coefficient of correlation between x and y:<br />
x y<br />
2 5<br />
1 7<br />
6 3<br />
12. To test whether a correlation between x and y is significant, we should test the null hypothesis _________ with<br />
alternative hypothesis____________; the test statistic is a __________ with ______ d.f.<br />
13. Describe three different ways to find the correlation coefficient using Excel.<br />
14. Comment on the following: Among the industrial nations, there is a negative correlation between average<br />
medical expenditures and life expectancy; this proves that medical care causes people to live shorter lives.<br />
15. r 2 is called the _______ ; it is interpreted as giving the ____ __ _____ in y which is ________ by variation in x.<br />
16. Generally speaking, what does r-squared tell us about a regression equation?<br />
17. ART's engineers regressed production costs on output and found the regression equation: cost = 4000 + 2 *<br />
output. In the regression results, s y.x = 1800 and s b = 0.6; the regression was based on a sample of 40 days’<br />
output and costs. Give a 98% confidence interval for β 1 .<br />
18. Using the data of the preceding question, formulate and conduct an appropriate test for the significance of the<br />
regression coefficient.<br />
19. The following Excel output was generated by regressing percentage rates of inflation on percentage rates of<br />
increase in the money supply:<br />
SUMMARY OUTPUT<br />
Regression Statistics<br />
Multiple R 0.7<br />
R Square 0.49<br />
Adjusted R Square 0.46<br />
Standard Error 1<br />
Observations 62<br />
ANOVA<br />
df SS MS F Significance F<br />
Regression 1 900<br />
Residual 60 6000<br />
Total 61 6900<br />
Coefficients Standard Error t Stat P-value<br />
Intercept -1 0.2 -5 0.00032<br />
X Variable 1 1.2 0.4
Stats II, Regression, <strong>page</strong> 3<br />
a) What is the simple correlation coefficient between prices and money?<br />
b) In a t test of H 0 : ρ = 0, what is the calculated value of t?<br />
c) In a t test of H 0 : β 1 = 0, what is the calculated value of t? At α = 0.01, what should we do with the<br />
null hypothesis?<br />
d) In an ANOVA test of this regression equation, what is the critical value of F for α = 0.025? (Use<br />
FINV to find the critical value.)<br />
e) What is the calculated value of F in an ANOVA test? Should we accept or reject the null hypothesis<br />
of no linear relation between money and inflation?<br />
20. In a regression ANOVA table, how are the following terms defined? Regression sum of squares; residual sum of<br />
squares; total sum of squares. What does each represent?<br />
21. In a regression of managers' salaries on firm size, researchers estimated the equation salary = 20000 + 5000 *<br />
sales, where sales were measured in millions of dollars. Observation number 42 works at a firm with annual<br />
sales of 8 million dollars, and he makes $53,000 a year. What is the residual for observation 42?<br />
22. How could a graph of the residuals from a regression equation help in determining whether ε is normally<br />
distributed?<br />
23. How might you use a histogram of the residuals from a regression equation?<br />
A CPA has gathered the following data for a sample of twelve corporations:<br />
Observation # Long-Term Assets Long-Term Debt<br />
1 54 28<br />
2 47 26<br />
3 60 39<br />
4 56 43<br />
5 64 24<br />
6 26 16<br />
7 47 30<br />
8 69 38<br />
9 62 43<br />
10 45 24<br />
11 48 36<br />
12 39 20<br />
24. (N) Suppose that we wish to know whether acquiring long-term assets is done primarily by acquiring long-term<br />
debt.<br />
a) Designating assets as y and debt as x, use your spreadsheet to find the regression equation of assets on debt;<br />
state this equation in algebraic notation.<br />
b) What does the x coefficient tell you about the relation between assets and debt?<br />
c) What is the correlation between assets and debt? Use a t test to find whether we can consider this significant.<br />
d) Use an appropriate t test to test whether the slope of the regression line can be considered different from 0;<br />
set your significance level at 5%.<br />
e) At 1% significance, use ANOVA to test H 0 : there is no significant linear relation between assets and debt.<br />
f) Make a point prediction of assets for a corporation which has 25 million dollars of long term debt.<br />
g) Give a 95% prediction interval for the assets of a corporation with 25 million dollars of debt.<br />
h) Give a 95% confidence interval for the average of all corporations that have 25 million dollars of debt.<br />
i) Compute and interpret the residual for observation #9.<br />
j) Give a 90% confidence interval for the value of β.<br />
25. What would you look for in a residual plot that would be a clue to the presence of each of the following<br />
conditions?<br />
a) non-normality of the residuals<br />
b) heteroscedasticity<br />
c) non-linearity of the relation between x and y
Stats II, Regression, <strong>page</strong> 4<br />
d) autocorrelation<br />
26. In the ANOVA table, the regression sum of squares is defined as SSR = Σ( ŷ −⎺y) 2 ; explain why that represents<br />
the variation in y which is “explained” by variation in x.<br />
27. The residual sum of squares, or error sum of squares, is defined as SSE = Σ(y − ŷ ) 2 ; explain why this term<br />
represents the variation in y which is NOT “explained” by variation in x.<br />
2<br />
28. r 2 2 ∑(<br />
y − yˆ)<br />
is defined as r = 1−<br />
. Explain how this definition leads to the interpretation usually given of<br />
2<br />
∑(<br />
y − y)<br />
r 2 .<br />
29. What condition is indicated by each of the following residual plots?<br />
A. B.<br />
C. D.
<strong>SELF</strong> <strong>TEST</strong>: MULTIPLE <strong>REGRESSION</strong><br />
1. Marketing researchers at ART, Inc., have regressed their sales on Gross Domestic Product and their own<br />
advertising expenditures with the following result:<br />
Sales = 400,000 + 4,000 × GDP + 7000 × A<br />
a) What could we predict ART's sales to be if GDP = 6.5 trillion and advertising expenditures = 20<br />
million?<br />
b) If GDP rose to 6.8 trillion, by how much would we expect sales to change?<br />
c) ART wishes to increase its unit sales by 21,000; by how much will they need to increase their<br />
advertising budget?<br />
2. Why is the use of adjusted R 2 preferred to the use of plain R 2 in multiple regression? What is it we're adjusting<br />
for?<br />
3. When is it important to use adjusted R 2 ? When is it not important?<br />
4. R 2 can be thought of as the proportion of ____________ in y which is ____________ by _____________ in the<br />
x's. State the definition of R 2 and explain why that definition leads to this interpretation.<br />
5. In performing a t test on a coefficient from multiple regression, what null and alternative hypotheses are we<br />
testing?<br />
The following Excel output is for questions 6 to 12:<br />
SUMMARY OUTPUT<br />
Regression Statistics<br />
Multiple R 0.774597<br />
R Square 0.6<br />
Adjusted R Square 0.52<br />
Standard Error 10.00<br />
Observations 16<br />
ANOVA<br />
df SS MS F Significance F<br />
Regression 3 1800<br />
Residual 12 1200<br />
Total 15 3000<br />
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%<br />
Intercept 20.00 10 2 0.06866<br />
X Variable 1 10.00 5<br />
X Variable 2 5.00 1.5<br />
X Variable 3 3.00 0.5<br />
6. What is the regression equation?<br />
7. How many degrees of freedom are there in the t Stats?<br />
8. According to the t ratios, which of the regression coefficients would be significant at the 5% level? Which at the<br />
10% level?<br />
9. What is the F ratio? What null hypothesis would be tested with this value? At α = 0.01, can we reject the null<br />
hypothesis? Can we reject at α = 0.05?
Stats II, Regression, <strong>page</strong> 6<br />
10. Suppose x 1 = 6, x 2 = 0, and x 3 = 2; what is ŷ ?<br />
11. Observation #8 had x 1 = 8, x 2 = 2, and x 3 = 4; for that observation, y = 108. What is the residual for this<br />
observation?<br />
12. Find a 95% confidence interval for ß 1 , the coefficient on variable X 1 .<br />
13. What is a dummy variable?<br />
14. A marketing researcher has created a dummy variable for "Owns own <strong>home</strong>." John lives in an apartment; what<br />
value will this dummy have for him? Mary is paying off the mortgage on her condominium; what value will this<br />
dummy have for her?<br />
15. In a regression of monthly entertainment expenditures on several things, the dummy of q. 14 had the value −$21.<br />
Explain the meaning of this number.<br />
16. What is multicollinearity? How can we detect it?<br />
17. What are the effects in regression analyses of multicollinearity?<br />
18. Suppose the relation between x and y is not linear: how could you detect this nonlinearity?<br />
19. (N) A researcher wishes to be able to predict the number of movies attended in a year's time on the basis of four<br />
explanatory variables: age, education, income, and sex. A sample of ten people yields the following data:<br />
No. of Movies Age Education Income Sex Dummy (Male = 1)<br />
25 18 11 35 1<br />
12 35 13 38 0<br />
21 21 14 35 1<br />
9 35 16 50 0<br />
18 25 14 36 0<br />
27 21 13 39 1<br />
4 39 13 37 0<br />
17 31 12 34 0<br />
17 20 14 41 1<br />
7 40 12 29 0<br />
a) Using your spreadsheet, find the regression equation and write it out in algebraic notation.<br />
b) Explain what each of the regression X coefficients means.<br />
c) Using an appropriate t test, at 5% significance test H 0 : β i = 0 for i = 1 to 4.<br />
d) What is the adjusted R 2 ? How would we interpret that number? Why is there so much difference<br />
in this case between R 2 and adjusted R 2 ?<br />
e) Using ANOVA state and test the appropriate null hypothesis to test whether there is a significant<br />
linear relation among these variables.<br />
f) Predict how many movies will be seen by a 37 year-old female high-school graduate whose family<br />
income is $43,000 a year.<br />
g) State the 95% confidence interval for each X coefficient.<br />
h) Calculate a 98% confidence interval for β 2<br />
i) Find the residual for the first observation (25 movies, age 18 and so on).<br />
j) In examining the residual plots generated by the Excel, do you detect any problems or violations of<br />
the regression assumptions?<br />
k) Does there appear to be significant multicollinearity among the X variables? How do you know<br />
that?
Stats II, Regression, <strong>page</strong> 7<br />
Selected Answers:<br />
Simple Regression::<br />
6. 17,100 19. a. 0.7<br />
7. a. sales = 400 + 0.5 × adv b. 7.59<br />
b. 3900 c. 3 ⇒ reject<br />
c. 1400 ± 505.07 d. 5.29<br />
d. 1400 ± 543.84 e. 9, reject<br />
11. −0.945 21. −$7,000<br />
17. 2 ± 1.4574 29. a. nothing in particular<br />
18. H 0 : β = 0; t = 3.33; p-value b. autocorrelation<br />
= 0.0019 c. non-linearity<br />
d. heteroscedasticity<br />
24. a. y-hat = 22.62 + 0.94 X<br />
b. for each one-dollar increase in debt, assets increase 94 cents<br />
c. 0.71; since p value = 0.0092, we can reject at 1% significance<br />
the hypothesis that population correlation = 0.<br />
d. for α = 0.05, critical t = 2.228 < calculated 3.219, so reject<br />
the null that β = 0. (Alternatively, since p < 0.05, reject.)<br />
e. Critical F = 10.04 < 10.359, so reject null and conclude there is<br />
a significant relation. (Alternatively, in ANOVA table p < 0.01, so reject null.)<br />
f. 46.16 g. 46.16 ± 20.71 h. 46.16 ± 6.72<br />
i. −1.107 j. 0.41 ≤ β 1 ≤ 1.47<br />
Multiple Regression:<br />
1. 566,000; +1,200; $3 million 6. ŷ = 20+10x 1 +5x 2 +3x 3 7. 12<br />
8. β 2 and β 3 at 5%; all at 10%<br />
9. F=6; with 3,12 d.f. F .01 =5.95, so reject H O at 1% and 5%<br />
10. 86 11. −14 12. 10 ± 10.89 14. 0; 1<br />
15. <strong>home</strong>owners typically spend $21 a month less on entertainment<br />
19. a. movies = 56.71 −0.93 x age −1.30 x educ + 0.096 x inc − 2.28 x male<br />
b. movies attended falls by .93 for each year age increases, falls<br />
by 1.3 for each extra year of education, and increases by about<br />
0.1 for each extra thousand dollars of family income; other<br />
things being equal males attend 2.28 fewer movies a year than females<br />
c. reject H 0 for β 1 since p = 0.024; fail to reject for i = 2 - 4 since all p values > 0.05<br />
d. Adj. R 2 = 0.77; these four variables explain 77% of the observed<br />
variation in movie attendance.<br />
e. H 0 : β 1 = β 2 = β 3 = β 4 = 0 vs. H 1 : at least one equality not true<br />
F = 8.549 with p value = 0.018, so at 2% significance we reject<br />
null and conclude there is a significant linear relation with at<br />
least one of the x variables.<br />
f. y-hat = 10.82.<br />
g. see output Lower 95% Upper 95%<br />
h. 3.37 ± 4.86<br />
i. since y-hat = 26.72, residual = −1.72<br />
j. no<br />
k. yes; education is highly correlated with income and sex with age; use Data Analysis<br />
Correlation tool