SELF-TEST: SIMPLE REGRESSION - MFC home page

ECO 22000 

McRAE 

SELF-TEST: SIMPLE REGRESSION 

Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you 

should be able to describe the procedures used to get an answer and be able to interpret the answers. 

1. What are the assumptions involved in simple linear regression? 

2. What line does the method of least squares actually find? 

3. What information might we get from a scatter plot of y against x? 

4. Describe how to use Excel to create a scatter diagram. 

5. Describe how to use Excel to calculate a regression line. 

6. The regression equation of starting salary on GPA for a sample of recent graduates of RCCC is salary = 8000 + 

3500 * GPA. Randy just graduated with a GPA of 2.6; what starting salary would the regression equation 

predict for him? 

7. For a cross-section of companies, a marketing analyst regressed sales on advertising expenditures, resulting in 

the following Excel output: 

SUMMARY OUTPUT 

Regression Statistics 

Multiple R 0.9 

R Square 0.81 

Adjusted R Square 0.80 

Standard Error 100 

Observations 45 

ANOVA 

df SS MS F Significance F 

Regression 1 48000 48000 16 0.000245081 

Residual 43 129000 3000 

Total 44 173000 

Coefficients Standard Error t Stat P-value 

Intercept 400 75 5.33 3.41301E -06 

Advertising 0.5 0.125 4 0.000245081 

a) Write out the regression equation, showing sales as a function of advertising expenditures. 

b) Give a point prediction for sales for a company whose advertising expenditures equal $7,000. 

c) Give a 95% confidence interval for the average sales level for a company spending $2,000 on 

advertising. Assume the mean advertising expenditure = $4,000. 

d) Give a 95% confidence interval for a specific value of y for a company spending $2,000 on 

advertising with x¯ = $4,000. 

e) Explain why the intervals in c. and d. are not the same.

Stats II, Regression, page 2 

8. What shape do confidence intervals for y values at given x values have? What does this imply about predicted 

values far from the mean value of x? 

9. Say whether the following statement is true or false and explain your answer: If a regression equation has a high 

r 2 , statisticians see no problem with making extrapolations well beyond the observed range of x and y values. 

10. What does the coefficient of correlation measure? How is it related to a regression line? 

11. Find the coefficient of correlation between x and y: 

x y 

2 5 

1 7 

6 3 

12. To test whether a correlation between x and y is significant, we should test the null hypothesis _________ with 

alternative hypothesis____________; the test statistic is a __________ with ______ d.f. 

13. Describe three different ways to find the correlation coefficient using Excel. 

14. Comment on the following: Among the industrial nations, there is a negative correlation between average 

medical expenditures and life expectancy; this proves that medical care causes people to live shorter lives. 

15. r 2 is called the _______ ; it is interpreted as giving the ____ __ _____ in y which is ________ by variation in x. 

16. Generally speaking, what does r-squared tell us about a regression equation? 

17. ART's engineers regressed production costs on output and found the regression equation: cost = 4000 + 2 * 

output. In the regression results, s y.x = 1800 and s b = 0.6; the regression was based on a sample of 40 days’ 

output and costs. Give a 98% confidence interval for β 1 . 

18. Using the data of the preceding question, formulate and conduct an appropriate test for the significance of the 

regression coefficient. 

19. The following Excel output was generated by regressing percentage rates of inflation on percentage rates of 

increase in the money supply: 




R Square 0.49 


Standard Error 1 


ANOVA 


Regression 1 900 

Residual 60 6000 

Total 61 6900 

Coefficients Standard Error t Stat P-value 

Intercept -1 0.2 -5 0.00032 

X Variable 1 1.2 0.4


a) What is the simple correlation coefficient between prices and money? 

b) In a t test of H 0 : ρ = 0, what is the calculated value of t? 

c) In a t test of H 0 : β 1 = 0, what is the calculated value of t? At α = 0.01, what should we do with the 

null hypothesis? 

d) In an ANOVA test of this regression equation, what is the critical value of F for α = 0.025? (Use 

FINV to find the critical value.) 

e) What is the calculated value of F in an ANOVA test? Should we accept or reject the null hypothesis 

of no linear relation between money and inflation? 

20. In a regression ANOVA table, how are the following terms defined? Regression sum of squares; residual sum of 

squares; total sum of squares. What does each represent? 

21. In a regression of managers' salaries on firm size, researchers estimated the equation salary = 20000 + 5000 * 

sales, where sales were measured in millions of dollars. Observation number 42 works at a firm with annual 

sales of 8 million dollars, and he makes $53,000 a year. What is the residual for observation 42? 

22. How could a graph of the residuals from a regression equation help in determining whether ε is normally 

distributed? 

23. How might you use a histogram of the residuals from a regression equation? 

A CPA has gathered the following data for a sample of twelve corporations: 

Observation # Long-Term Assets Long-Term Debt 

1 54 28 

2 47 26 

3 60 39 

4 56 43 

5 64 24 

6 26 16 

7 47 30 

8 69 38 

9 62 43 

10 45 24 

11 48 36 

12 39 20 

24. (N) Suppose that we wish to know whether acquiring long-term assets is done primarily by acquiring long-term 

debt. 

a) Designating assets as y and debt as x, use your spreadsheet to find the regression equation of assets on debt; 

state this equation in algebraic notation. 

b) What does the x coefficient tell you about the relation between assets and debt? 

c) What is the correlation between assets and debt? Use a t test to find whether we can consider this significant. 

d) Use an appropriate t test to test whether the slope of the regression line can be considered different from 0; 

set your significance level at 5%. 

e) At 1% significance, use ANOVA to test H 0 : there is no significant linear relation between assets and debt. 

f) Make a point prediction of assets for a corporation which has 25 million dollars of long term debt. 

g) Give a 95% prediction interval for the assets of a corporation with 25 million dollars of debt. 

h) Give a 95% confidence interval for the average of all corporations that have 25 million dollars of debt. 

i) Compute and interpret the residual for observation #9. 

j) Give a 90% confidence interval for the value of β. 

25. What would you look for in a residual plot that would be a clue to the presence of each of the following 

conditions? 

a) non-normality of the residuals 

b) heteroscedasticity 

c) non-linearity of the relation between x and y


d) autocorrelation 

26. In the ANOVA table, the regression sum of squares is defined as SSR = Σ( ŷ −⎺y) 2 ; explain why that represents 

the variation in y which is “explained” by variation in x. 

27. The residual sum of squares, or error sum of squares, is defined as SSE = Σ(y − ŷ ) 2 ; explain why this term 

represents the variation in y which is NOT “explained” by variation in x. 

2 

28. r 2 2 ∑( 

y − yˆ) 

is defined as r = 1− 

. Explain how this definition leads to the interpretation usually given of 

2 

∑( 

y − y) 

r 2 . 

29. What condition is indicated by each of the following residual plots? 

A. B. 

C. D.

SELF TEST: MULTIPLE REGRESSION 

1. Marketing researchers at ART, Inc., have regressed their sales on Gross Domestic Product and their own 

advertising expenditures with the following result: 

Sales = 400,000 + 4,000 × GDP + 7000 × A 

a) What could we predict ART's sales to be if GDP = 6.5 trillion and advertising expenditures = 20 

million? 

b) If GDP rose to 6.8 trillion, by how much would we expect sales to change? 

c) ART wishes to increase its unit sales by 21,000; by how much will they need to increase their 

advertising budget? 

2. Why is the use of adjusted R 2 preferred to the use of plain R 2 in multiple regression? What is it we're adjusting 

for? 

3. When is it important to use adjusted R 2 ? When is it not important? 

4. R 2 can be thought of as the proportion of ____________ in y which is ____________ by _____________ in the 

x's. State the definition of R 2 and explain why that definition leads to this interpretation. 

5. In performing a t test on a coefficient from multiple regression, what null and alternative hypotheses are we 

testing? 

The following Excel output is for questions 6 to 12: 




R Square 0.6 


Standard Error 10.00 


ANOVA 


Regression 3 1800 

Residual 12 1200 

Total 15 3000 

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% 

Intercept 20.00 10 2 0.06866 

X Variable 1 10.00 5 

X Variable 2 5.00 1.5 

X Variable 3 3.00 0.5 

6. What is the regression equation? 

7. How many degrees of freedom are there in the t Stats? 

8. According to the t ratios, which of the regression coefficients would be significant at the 5% level? Which at the 

10% level? 

9. What is the F ratio? What null hypothesis would be tested with this value? At α = 0.01, can we reject the null 

hypothesis? Can we reject at α = 0.05?


10. Suppose x 1 = 6, x 2 = 0, and x 3 = 2; what is ŷ ? 

11. Observation #8 had x 1 = 8, x 2 = 2, and x 3 = 4; for that observation, y = 108. What is the residual for this 

observation? 

12. Find a 95% confidence interval for ß 1 , the coefficient on variable X 1 . 

13. What is a dummy variable? 

14. A marketing researcher has created a dummy variable for "Owns own home." John lives in an apartment; what 

value will this dummy have for him? Mary is paying off the mortgage on her condominium; what value will this 

dummy have for her? 

15. In a regression of monthly entertainment expenditures on several things, the dummy of q. 14 had the value −$21. 

Explain the meaning of this number. 

16. What is multicollinearity? How can we detect it? 

17. What are the effects in regression analyses of multicollinearity? 

18. Suppose the relation between x and y is not linear: how could you detect this nonlinearity? 

19. (N) A researcher wishes to be able to predict the number of movies attended in a year's time on the basis of four 

explanatory variables: age, education, income, and sex. A sample of ten people yields the following data: 

No. of Movies Age Education Income Sex Dummy (Male = 1) 

25 18 11 35 1 

12 35 13 38 0 

21 21 14 35 1 

9 35 16 50 0 

18 25 14 36 0 

27 21 13 39 1 

4 39 13 37 0 

17 31 12 34 0 

17 20 14 41 1 

7 40 12 29 0 

a) Using your spreadsheet, find the regression equation and write it out in algebraic notation. 

b) Explain what each of the regression X coefficients means. 

c) Using an appropriate t test, at 5% significance test H 0 : β i = 0 for i = 1 to 4. 

d) What is the adjusted R 2 ? How would we interpret that number? Why is there so much difference 

in this case between R 2 and adjusted R 2 ? 

e) Using ANOVA state and test the appropriate null hypothesis to test whether there is a significant 

linear relation among these variables. 

f) Predict how many movies will be seen by a 37 year-old female high-school graduate whose family 

income is $43,000 a year. 

g) State the 95% confidence interval for each X coefficient. 

h) Calculate a 98% confidence interval for β 2 

i) Find the residual for the first observation (25 movies, age 18 and so on). 

j) In examining the residual plots generated by the Excel, do you detect any problems or violations of 

the regression assumptions? 

k) Does there appear to be significant multicollinearity among the X variables? How do you know 

that?


Selected Answers: 

Simple Regression:: 

6. 17,100 19. a. 0.7 

7. a. sales = 400 + 0.5 × adv b. 7.59 

b. 3900 c. 3 ⇒ reject 

c. 1400 ± 505.07 d. 5.29 

d. 1400 ± 543.84 e. 9, reject 

11. −0.945 21. −$7,000 

17. 2 ± 1.4574 29. a. nothing in particular 

18. H 0 : β = 0; t = 3.33; p-value b. autocorrelation 

= 0.0019 c. non-linearity 

d. heteroscedasticity 

24. a. y-hat = 22.62 + 0.94 X 

b. for each one-dollar increase in debt, assets increase 94 cents 

c. 0.71; since p value = 0.0092, we can reject at 1% significance 

the hypothesis that population correlation = 0. 

d. for α = 0.05, critical t = 2.228 < calculated 3.219, so reject 

the null that β = 0. (Alternatively, since p < 0.05, reject.) 

e. Critical F = 10.04 < 10.359, so reject null and conclude there is 

a significant relation. (Alternatively, in ANOVA table p < 0.01, so reject null.) 

f. 46.16 g. 46.16 ± 20.71 h. 46.16 ± 6.72 

i. −1.107 j. 0.41 ≤ β 1 ≤ 1.47 

Multiple Regression: 

1. 566,000; +1,200; $3 million 6. ŷ = 20+10x 1 +5x 2 +3x 3 7. 12 

8. β 2 and β 3 at 5%; all at 10% 

9. F=6; with 3,12 d.f. F .01 =5.95, so reject H O at 1% and 5% 

10. 86 11. −14 12. 10 ± 10.89 14. 0; 1 

15. homeowners typically spend $21 a month less on entertainment 

19. a. movies = 56.71 −0.93 x age −1.30 x educ + 0.096 x inc − 2.28 x male 

b. movies attended falls by .93 for each year age increases, falls 

by 1.3 for each extra year of education, and increases by about 

0.1 for each extra thousand dollars of family income; other 

things being equal males attend 2.28 fewer movies a year than females 

c. reject H 0 for β 1 since p = 0.024; fail to reject for i = 2 - 4 since all p values > 0.05 

d. Adj. R 2 = 0.77; these four variables explain 77% of the observed 

variation in movie attendance. 

e. H 0 : β 1 = β 2 = β 3 = β 4 = 0 vs. H 1 : at least one equality not true 

F = 8.549 with p value = 0.018, so at 2% significance we reject 

null and conclude there is a significant linear relation with at 

least one of the x variables. 

f. y-hat = 10.82. 

g. see output Lower 95% Upper 95% 

h. 3.37 ± 4.86 

i. since y-hat = 26.72, residual = −1.72 

j. no 

k. yes; education is highly correlated with income and sex with age; use Data Analysis 

Correlation tool

SELF-TEST: SIMPLE REGRESSION - MFC home page

Create successful ePaper yourself

Delete template?

Save as template?