552 <strong>CHAPTER</strong> THIRTEEN <strong>Simple</strong> <strong>Linear</strong> <strong>Regression</strong> FIGURE <strong>13</strong>.22 Scatter plots for four data sets Y 10 Y 10 5 5 5 10 Panel A 15 20 5 10 15 20 Panel B Y Y 10 10 5 5 5 10 15 20 Panel C 5 10 Panel D 15 20 FIGURE <strong>13</strong>.23 Residual plots for four data sets Residual +2 Residual +2 +1 +1 0 0 –1 –1 –2 5 10 15 20 X Panel A –2 5 10 15 20 X Panel B Residual +4 Residual +4 +3 +3 +2 +2 +1 +1 0 0 –1 –1 –2 5 10 15 20 X Panel C –2 5 10 15 20 X Panel D
<strong>13</strong>.9: Pitfalls in <strong>Regression</strong> and Ethical Issues 553 In summary, scatter plots and residual plots are of vital importance to a complete regression analysis. The information they provide is so basic to a credible analysis that you should always include these graphical methods as part of a regression analysis. Thus, a strategy that you can use to help avoid the pitfalls of regression is as follows: 1. Start with a scatter plot to observe the possible relationship between X and Y. 2. Check the assumptions of regression before moving on to using the results of the model. 3. Plot the residuals versus the independent variable to determine whether the linear model is appropriate and to check the equal-variance assumption. 4. Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to check the normality assumption. 5. If you collected the data over time, plot the residuals versus time and use the Durbin- Watson test to check the independence assumption. 6. If there are violations of the assumptions, use alternative methods to least-squares regression or alternative least-squares models. 7. If there are no violations of the assumptions, carry out tests for the significance of the regression coefficients and develop confidence and prediction intervals. 8. Avoid making predictions and forecasts outside the relevant range of the independent variable. 9. Keep in mind that the relationships identified in observational studies may or may not be due to cause-and-effect relationships. Remember that while causation implies correlation, correlation does not imply causation. America’s Top Models From the Author’s Desktop Perhaps you are familiar with the TV competition organized by model Tyra Banks to find “America’s top model.” You may be less familiar with another set of top models that are emerging from the business world. In a Business Week article from its January 23, 2006, edition (S. Baker, “Why Math Will Rock Your World: More Math Geeks Are Calling the Shots in Business. Is Your Industry Next?” Business Week, pp. 54–62), Stephen Baker talks about how “quants” turned finance upside down and is moving on to other business fields. The name quants derives from the fact that “math geeks” develop models and forecasts by using “quantitative methods.” These methods are built on the principles of regression analysis discussed in this chapter, although the actual models are much more complicated than the simple linear models discussed in this chapter. <strong>Regression</strong>-based models have become the top models for many types of business analyses. Some examples include ■ Advertising and marketing Managers use econometric models (in other words, ■ ■ regression models) to determine the effect of an advertisement on sales, based on a set of factors. Also, managers use data mining to predict patterns of behavior of what customers will buy in the future, based on historic information about the consumer. Finance Any time you read about a financial “model,” you should understand that some type of regression model is being used. For example, a New York Times article on June 18, 2006, titled “An Old Formula That Points to New Worry” by Mark Hulbert (p. BU8) discusses a market timing model that predicts the return of stocks in the next three to five years, based on the dividend yield of the stock market and the interest rate of 90-day Treasury bills. Food and beverage Believe it or not, Enologix, a California consulting company, has developed a “formula” (a regression model) that predicts a wine’s quality index, based on a set of chemical compounds found in the wine (see D. Darlington, “The Chemistry of a 90+ Wine,” The New York Times Magazine, August 7, 2005, pp. 36–39). ■ ■ ■ Publishing A study of the effect of price changes at Amazon.com and BN.com on sales (again, regression analysis) found that a 1% price change at BN.com pushed sales down 4%, but it pushed sales down only 0.5% at Amazon.com. (You can download the paper at http://gsbadg. uchicago.edu/vitae.htm.) Transportation Farecast.com uses data mining and predictive technologies to objectively predict airfare pricing (see D. Darlin, “Airfares Made Easy (Or Easier),” The New York Times, July 1, 2006, pp. C1, C6). Real estate Zillow.com uses information about the features contained in a home and its location to develop estimates about the market value of the home, using a “formula” built with a proprietary algorithm. In the article, Baker stated that statistics and probability will become core skills for businesspeople and consumers. Those who are successful will know how to use statistics, whether they are building financial models or making marketing plans. He also strongly endorsed the need for everyone in business to have knowledge of Microsoft Excel to be able to produce statistical analysis and reports.