REGRESSION ANALYSIS II - icpsr - University of Michigan

REGRESSION ANALYSIS II 

June/July, 2007 

Tim McDaniel 

Buena Vista University 

Storm Lake, Iowa USA 

mcdaniel@bvu.edu 

Introduction 

The course is intended for social scientists who are comfortable with algebra and very basic statistics, and 

now want to learn applied ordinary least squares (OLS) regression analysis for their own research and/or 

to understand the work of others. 

As social scientists, it is important that we know how to use multiple regression. But it is also important 

for us to know why and how multiple regression works (and fails) under varying conditions. Given this, 

we will discuss much of the mathematical and statistical theory behind multiple regression and also some 

potential drawbacks and circumstantial limitations. So while the presentations will not be purely 

theoretical, neither will this course be “cookbook” in nature. 

Please note that the primary goal of the course is to develop an applied and intuitive (as opposed to purely 

theoretical or purely mathematical) understanding of the topics. Whenever possible presentations will be 

made in “Words,” “Pictures,” and “Math” languages in order to appeal to a variety of learning styles. 

Course Topics 

We will begin with a quick review of basic univariate statistics and hypothesis testing. I am assuming 

that most of this material will indeed be a “review” for (i.e., not new to) you. 

Then we will cover topics in bivariate and then multivariate regression, including 

• Model specification and interpretation 

• Diagnostic tests and plots 

• Analysis of residuals 

• Influence and outliers 

• Transformations to induce linearity 

• Multicollinearity 

• Multiplicative interaction terms 

• Dummy (dichotomous) independent variables 

• Categorical (e.g., Likert scale) independent variables 

The last few classes will more quickly consider topics including 

• Applications involving matrix algebra 

• Heteroscedasticity 

• Autocorrelation 

• Generalized Least Squares (GLS) 

• Weighted Least Squares (WLS) 

• Logit models and analysis 

• Probit models

Lecture Transcripts 

This course will utilize approximately 850 pages of Lecture Transcripts which will serve as the 

“textbook” for this course and an information resource for you after the course ends. These Lecture 

Transcripts will also significantly reduce the amount of notes you have to write during class, which means 

you can concentrate much more on learning and understanding the material itself. 

A detailed outline of the Lecture Transcripts is included in this syllabus. This outline also serves as a 

“Table of Contents” and an “Index” for these Lecture Transcripts. 

Packets #1, #2, and #3 constitute “Bundle #1” of the Lecture Transcripts; this Bundle is furnished to all 

participants free of charge on the first day of class and will be used for the first two class days. The 

remainder of the twenty Lecture Transcript packets (Numbers 4 through 20, contained in Bundles #2, #3, 

#4, and #5) are available for purchase. We will begin using Bundle #2 on the third class day. Details 

regarding the purchase of these last four Bundles (Packets 4 through 20) will be discussed the first day of 

class. 

Although these Lecture Notes/Transcripts are detailed, comprehensive, and self-contained, it is still 

important that you study the assigned readings before each class, ask questions during class, and talk to 

either me or one of the Teaching Assistants outside of class if you are to maximize your benefit from this 

course. These notes contain several algebraic derivations and proofs that we will not be taking class time 

to go over or work through, but instead are provided for your information and inspection outside of class. 

Textbooks and Other Readings 

There is no required textbook. Later in this syllabus I provide you with information about some optional 

supplemental readings from various textbooks and journals, including which readings are appropriate for 

each of the twenty packets of Lecture Transcripts. See the beginning of the “Some Suggested Readings” 

section for a discussion of four types of textbooks included in these optional readings. 

I do not think that any one of these textbooks is significantly better than the others; instead, the one you 

might think is best will depend upon a number of personal factors, including your particular learning 

style. Therefore, instead of just picking one textbook, I have designed the course so that you can 

experiment and pick-and-choose which style of learning and (therefore) textbook you prefer. Of course, 

you may decide to read and study (and maybe purchase) none, one, two, three, or even all four of these 

textbooks; again, that is entirely up to you. In the alphabetized list of “Some Suggested Readings” near 

the end of this syllabus, you might want to consider the textbook readings the primary assigned readings 

for each packet. Other textbooks are also appropriate for this course; see me if you have any questions 

about this. 

I have included a bibliography for the textbooks and all of the other readings, including quite a few of the 

little green Sage University Paper Series on Quantitative Applications in the Social Sciences monographs. 

You may notice that some (though certainly not all) of the other readings tend to be from political science 

books and journals; however, this is NOT a cause for concern since they deal with methodological topics 

that are easily and broadly generalizable across other social science disciplines (I would not have selected 

them otherwise...). All materials listed in the bibliography are available at the Summer Program’s 

Library; all textbooks and monographs should be available for purchase. Before you actually buy 

anything for this course, I strongly suggest you first check it out of Summer Program’s Library.

Classes, Assignments, Software, and Matrix Algebra 

I will hold an Optional Session the evening of our first class day (Tuesday); the time and room will be 

announced in that first day’s class. During this session I will continue and finish a review of basic 

statistics that we will have started earlier that afternoon. The notes and Lecture Transcripts for this 

review are in the Packets #1 and #2 (in Bundle #1). Some of you will find a continuation and completion 

of this review useful, while others may decide not to attend this first evening’s session (remember it is 

optional!); in either case, you at least will have the lecture transcripts for this review of basic statistics. 

We will start Packet #3 at the beginning of the second (Wednesday’s) class. 

The first Assignment involves some mathematical and summation notation manipulation; its purpose is to 

re-familiarize you with those basic tools. For the last three Assignments you will use SPSS as a tool to 

generate output; then your substantive task will be to interpret and evaluate that output. 

It is important to note that this is a course on Regression Analysis, NOT on computer or software usage. 

So do not worry if you are unfamiliar with SPSS; it is very easy to learn and to use (which is why we use 

it in this course) and we will examine output, and discussed how it was generated, in class. Similarly, do 

not be concerned if you will use some software other than SPSS in your own work; never forget that our 

goal here to learn “Regression Analysis,” not “software usage.” 

There are computer counselors and Teaching Assistants available to aid you in using the computers and 

SPSS to generate the output you will analyze in your assignments. The data files for the homework, as 

well as for most of the in-class examples, are available on the Summer Program’s computer server. I will 

distribute an Answer Key for each assignment; that provides you with yet another excellent “learning 

opportunity.” IMPORTANT: You will definitely want to attend some of the ICPSR Summer Program 

Lecture Series “Introduction to Computing” if you are not already extremely comfortable with basic 

computer operations, SPSS, or Windows. 

I have designed the course so that matrix algebra will not be used during the first three weeks. I feel it is 

better and more efficient to learn as much of the regression material as possible in a more familiar 

environment (scalar algebra). I will present most of the multiple regression material again on Monday of 

the fourth week, this time in matrix algebra format and under the assumption that you now understand 

and comprehend its substance. Most of the more advanced topics we will cover in the fourth week will 

also be presented using matrices. Understanding matrix algebra is not only necessary for a thorough 

comprehension of some of the topics we will discuss in the later part of this course, but is also vital 

should you decide to take more advanced courses in quantitative methods. IMPORTANT: You will 

definitely want to attend the ICPSR Summer Program Lecture Series “Mathematics for Social Scientists 

II” if you are at all weak in this area. (Matrix algebra will be presented during the first two weeks of this 

lecture series.) 

Teaching Assistants 

Remember, the primary purpose of the class meetings, Lecture Transcripts, and Assignments is to help 

you learn this material. You are not in this alone. Studying and working with your fellow participants is 

probably a good idea for many of you, as is taking advantage of “Office Hours” opportunities. 

The Teaching Assistants and I make every effort to be accessible to you. I encourage you to attend our 

office hours, or to make an appointment if those hours do not work well within your schedule. Early in 

the course I will give you more information regarding these matters.

“Practical Matters” Involving Assignments and Grading 

Quantity. There will be four Assignments, worth 10, 20, 40, and 30 points respectively. 

Purpose. The primary purpose of the Assignments is to further enhance your learning of the material. 

They also serve as the sole evaluation vehicle (i.e., the only thing that is graded) for those of you taking 

the course for a grade. 

When They Are Due. The “Due Date” for each Assignment will be announced when that assignment is 

distributed in class. You will submit each Assignment at the beginning of the class on that due date. 

When You Will Get It Back. The Teaching Assistants for this course will try to get your assignments 

graded in a timely fashion so they can be returned to you as soon as possible. 

Answer Keys. Everyone will receive an Answer Key for each assignment, whether you submitted the 

assignment or not. 

Calculator. You might need a basic calculator for some of the Assignment tasks. 

Details.... Your work will be graded on the quality, clarity, completeness, and accuracy of your 

presentation, so: 

• Make it easy to read, follow, and understand your work (i.e., be organized and print neatly). 

• Show all of your work, not just your bottom-line answer. 

• Print your name, the course name, and the Assignment Number at the top of the first page. 

• Also print your name at the top of the each page. 

• Staple the sheets of paper together, in the upper-left-hand corner. 

Do Not Be Late. Unless we have made specific arrangements beforehand, no Assignment submitted late 

will be accepted. 

Not “For Credit”? No Problem.... If you are not taking the course “for credit.” then you can still submit 

none, some, or all of the Assignments for grading; it is completely up to you. 

Grade Appeals. If you ever have any questions about any of your grades (e.g., partial credit decisions) 

then you can see me (not a Teaching Assistant!) to discuss the situation and possibly appeal that grading 

decision. However, here are a couple of relevant course policies: 

• You must wait at least 24 hours after receiving your grade. This “24 Hour Rule” gives you 

time to study and contemplate your work-product, your notes, and the Answer Key and to think 

about the grading decision. 

• The maximum amount of time you have to appeal a grading decision to me is three class days 

after the assignment is returned; after then no grading appeals will be considered. 

Conclusion 

Learning the material in this course will require a substantial amount of effort on your part... but that is 

why you are here, and the payoff will be worth that effort. Let me know if I can be of any additional 

assistance to you in this endeavor. 

It is an honor to be your instructor for this course.

Outline of the Course 

(Organized by Packets of Lecture Notes/Transcripts) 

Packet I. Basic Statistics Review 

#1 A. Summations and Sigma Notation 

B. Basic Statistics 

1. Mean 

2. Variance and Standard Deviation 

3. Probability 

4. Random Variables 

a. Continuous versus Discrete 

b. Nominal 

c. Ordinal 

d. Interval 

5. Standardized Variables 

6. Expected Value 

7. Covariance 

8. Correlation (and Causality) 

9. Independence 

10. Normal Distribution 

11. Central Limit Theorem 

12. Student’s T Distribution 

13. Hypothesis Testing 

14. Confidence Intervals 

15. Prob-Values (a.k.a. P-Values) 

16. Properties of Estimators 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet II. Supplement to Basic Statistics Review 

#2 A. A Closer Look at Population and Sample Variances 

B. Hypothesis Testing: Summary, Flowchart, Protocol, and p-Values 

C. Some Abuses and Misuses of Probability and Statistics 

D. Symbol Glossary 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet III. Bivariate Regression 

#3 A. Notation 

B. Fitting a Line 

C. Ordinary Least Squares Assumptions 

D. Estimating the Intercept and the Slope 

E. Centered Variables 

F. The Estimated Slope Coefficient, “b” 

Assignment 

1. Variance of “b” 

#1 2. Confidence Interval for “b” 

Given 

3. Hypothesis Testing 

G. The Gauss-Markov Theorem 

H. Appendix: Deriving the Formulas for “a” and “b” Using Calculus 

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Packet I. Residuals 

#4 1. Definition and Estimation 

2. True, Population “Error” as a Disturbance, or Stochastic Element 

J. Explained, Unexplained, and Total Deviations and Sums of Squares 

K. Goodness of Fit 

1. Coefficient of Determination (R-Squared) 

a. Correlations 

b. Why the R-Squared Can Be Inappropriate and Misleading 

c. Perils of Maximizing R-Squared: A Monte Carlo Simulation 

2. Standard Error of Regression (SER) 

L. Standardized Variables and Standardized Variable Coefficients 

M. Reporting OLS Regression Results 

N. Regression Forced Through the Origin 

1. Definition 

2. Illustrations and Examples 

3. The Importance of Theoretical and Substantive Justifications 

O. Comparison of Centering, Standardizing, and Forcing Through the Origin 

P. Another Note on the Meaning and Interpretation of “a,” “b,” and Y-Hat 

Q. An Analogy Involving Means, Slopes, Standardization, Samples and Populations 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet R. Functional Transformations of Independent Variables 

#5 1. The Need for Transformations 

2. The Regression is Still Linear 

3. The Natural Log and the Square Root Transformations 

4. The Square Transformation 

5. Other Transformations and Functional Forms 

6. Presenting Findings With Transformed Independent Variables 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet S. Interpolation 

#6 T. Predictive Intervals 

U. Extrapolation 

V. Some Simple Diagnostic Plots 

Assignment 

1. Y vs. X (and an Introduction to Outliers) 

#2 2. Y vs. Y-Hat 

Given 

3. Residual vs. X 

4. Residual vs. Case Number 

5. Residual vs. Lagged Residual 

6. Residual vs. Y-Hat 

W. Simpson’s Paradox (Aggregation Bias) 

X. Generation and Interpretation of Computer Output Using Real Data 

1. Setting Up the Example 

2. Producing and Analyzing the Output 

Y. The Usefulness of Simple Scatter Plots: An Illustration 

Z. The Relationship Between R-Squared, “b,” and SER: 

A Replication of, and Additional Analysis Using, a Monte Carlo Simulation 

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Packet IV. Introduction to Multiple Regression 

#7 A. Limits of Bivariate Regression 

B. Trivariate Regression 

1. Visualization 

2. The Residual Term 

a. Definition 

b. The SER and Degrees of Freedom 

3. The (Two) Estimated Slope Coefficients 

a. Meaning 

b. Partial Effects Equations 

c. Computing the Two Slope Coefficients 

d. Variance and Confidence Intervals 

4. The Impact on OLS Assumptions 

5. Holding One Variable “Constant”: What’s That All About? 

6. Representing Partial Effects Using Algebra and Models 

7. Representing Partial Effects Using Venn Diagrams 

C. Multivariate Regression: The General OLS Model 

1. The Slope Coefficient 

a. Meaning 

b. Partial Effects Equations 

c. Computing the Slope Coefficients 

d. Variance and Confidence Intervals 

e. OLS Assumptions and the Gauss-Markov Theorem 

2. The Residual Term 

a. Definition 

b. The SER and Degrees of Freedom: The General Case 

3. Comparisons to Trivariate Regression 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet 

4. Summary and Review of Partial Effects and the Interpretation of “b” 

#8 5. Summary and Review of T-Stats, Prob-Values, and Hypothesis Tests 

6. Units of the SER and Comparing SER’s Across Equations 

7. R-Squared and Adjusted R-Squared 

a. Review of the R-Squared Statistic 

b. Inappropriateness of Comparing R-Squareds Across Models 

c. R-Squared and Functional Transformations 

d. Multiple Independent Variables and the Effect on R-Squared 

e. Computing the Adjusted R-Squared Value 

f. Interpreting Adjusted R-Squared 

8. Interactions 

a. Description 

b. Model Specification (Including “Can I Exclude Some Terms?”) 

c. Analogy to Functional Transformations 

d. Explaining Models with Interaction Terms: Examples 

e. Interpreting Models with Interactions 

i. Algebraic Techniques 

ii. Graphical Techniques 

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Packet 

9. Multicollinearity and Multicollinearity Diagnostics 

#9 a. Perfect Multicollinearity: An Example 

b. Discussing Multicollinearity Using Venn Diagrams 

c. Consequences, Including Possible “Backdoor Bias” 

d. When to Suspect Multicollinearity Problems 

e. Diagnosing Multicollinearity (And How Not To Do It) 

f. Possible Remedies 

g. Narrowing Down the Source of the Multicollinearity 

h. A Statistics, Estimation, and Information (Not a Theory) Problem 

10. Dummy and Categorical Independent Variables 

a. Definition and Interpretation 

b. The Importance of “Intervalness” for Independent Variables 

c. Replacing a Categorical Independent Variable with Dummies 

i. Excluding one Dummy from the Model 

ii. Implementation and Interpretation 

d. Graphing Models with Dummy Variables (and Interactions) 

e. Brief Diversion: Comparing “Regression Forced Through Origin” to 

“Having X in an Interaction but Not as a Stand-Alone Term” 

f. Algebraic Interpretation of Category Dummies and “Jumps” 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet 

11. Functional Transformations 

#10 a. Things to Consider Regarding Transforming Y 

b. Log Transformations and “Constant Elasticity” Models 

c. Conditional Impacts and Slopes in Models with Interaction Terms 

d. Models with an “X Squared” Term (Including Threshold Models) 

e. The Bend Rule 

12. Model Specification 

a. Review of Types of Specification Error 

b. Omitting Relevant Variables: Derivation and Consequences 

c. Including Irrelevant Variables: Consequences 

d. Variable Selection 

e. Perils of Stepwise Regression 

f. An Alternative to Standardization for Interval-Level Xs 

13. Missing Data 

a. Review of the Cov(X, e) = E[Xe] = 0 Assumption 

b. Data Missing at Random 

i. Dependent Variable 

ii. Independent Variables 

c. “Solutions” and Their Potential Problems 

i. Casewise (Listwise) Deletion 

ii. Pairwise Deletion 

iii. Mean and Conditional Mean Substitution 

14. Measurement Errors 

a. In the Dependent Variable (Y) 

b. In the Independent Variable(s) (X) 

15. Partial Effects and Linearity in Multivariate Regression Models 

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Packet 

16. Summary: The Effects of Multicollinearity and Specification Errors on 

#11 Slope Coefficient Estimation and Hypothesis Testing 

a. Type I and Type II Errors 

b. Flowcharts: Review of the Logic Behind Hypothesis Tests 

Involving Specification Errors and Multicollinearity 

c. Essay: Hypothesis Testing and American Criminal Trials 

Assignment 

17. Use of Partial Effects Plots to Detect Non-Linearity 

#3 18. Another Look at the “Residual vs. Case Number” Diagnostic Plot 

Given 

19. Another Look at Outliers 

a. What They Are 

b. Why They Matter 

c. A Flowchart Guide to Dealing With Outliers 

20. Multivariate Regression: A Computer Example Using ANES Data 

a. Setting-Up the Example (American National Election Study) 

b. Generation and Interpretation of Computer Output 

i. Effects of Adding Independent Variables to the Model 

ii. Categorical Independent Variables and “Intervalness” 

c. Analysis and Wrap-Up of Computer Example 

21. “Split-Samples” Versus “Big Models with Dummy Interaction Terms” 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet V. Regression Diagnostics and Graphical Techniques: A Closer Look 

#12 A. Outliers: Detection, Classification, and Possible Effects 

1. Influence 

a. Definition and Description 

b. Detection Using Statistics 

i. DFBETA and DFBETAS 

[1] Definition 

[2] Interpretation and Use 

ii. Cook’s Distance (D) 



iii. DFFITS 



2. Leverage 


b. Detection Using Statistics 

i. Hat Values 



ii. Studentized Residuals 

[1] Definition and Description 

[2] The Need for Standardized Residuals 


[4] The Bonferroni Adjustment 

[5] An Easy Method of Calculation Using Regression

B. Diagnostic Plots and Graphical Techniques 

1. Proportional Leverage Plots 


b. Different Types 

2. Jointly Influential Data 


b. Examples 

3. Conditional Effects Plots 


b. Presentation and Interpretation 

c. Models with Interaction Terms (Including Confidence Bounds) 

d. The Importance of “Conditional” 

4. Plots of Categorical Independent Variables: Jittered Data Plots 

5. Scatterplot Matrices 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet VI. Analysis of Variance and the F-Test 

#13 A. The F Distribution 

B. The ANOVA Table 

C. Similarities to Regression 

D. The F-Statistic and the F-Test 

1. A Test Involving All of the Regression Coefficients 

2. The Special Case When the F-Test and T-Test Are Identical 

E. Analysis and Demonstrations Using Computer Output 

1. Interpreting the F-Test: A Randomly Generated Variables Illustration 

2. Another Look at R-Squared, Adjusted R-Squared, and SER 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet VII. Categorical Independent Variables (and F-Tests): A Closer Look 

#14 A. Problems using Categorical and Non-Interval Independent Variables 

B. Review: Replacing a Categorical X With Dummy Variables in OLS 

C. The F-Test Revisited 

1. A Review of the Whole-Model F-Test in a Regression Environment 

Assignment 

2. Nested F-Tests: Testing Groups (Subsets) of Regression Coefficients 

#4 a. Definition of Hypothesis 

Given 

b. Procedure 

c. Similarity and Generalization to the Whole-Model F-Test 

3. The Chow Test 

a. Definition and Purpose 

b. Structural Shifts 

c. Procedure 

d. And Aggregation Bias 

4. Overview of Comparing “Safe” versus “Risky” Models 

5. Comparison of SER’s With No Prior Knowledge of Relative Model 

Performance (I.e., No “Safe” or “Risky” Models) 

6. Relative Statistical Contribution of Groups of Variables in a Model 

D. Collapsing Categories of a Categorical Variable 

1. Set-Up and Notation 

2. Statistical Hypothesis Test for Collapsing Categories

E. Treating a Categorical Variable as Interval-Level 

1. “Contextually” Interval 

2. Statistical Hypothesis Test for Intervalness 

F. Monte Carlo Simulation Results 

1. Consequences of Categorical (Non-Interval) Independent Variables 

2. Which Dummy Category Variable to Exclude 

G. Generalizations of the F-Test 

H. Optional: Discussion and Analysis of a Substantive Example 

1. Description of the Study 

2. Statistical Tests Performed 

3. Potential Effects of Multicollinearity 

4. Different Specifications, Interpretations, and Conclusions 

5. Interpretation of Manipulated Coefficients 

a. Changing the Sign of a Coefficient Value 

b. Subtracting Coefficients 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet VIII. Matrices and Matrix Algebra 

#15 A. General Terms and Definitions 

B. Addition, Subtraction, and Multiplication: How to Do It 

C. An Example: Matrix Multiplication and Lawyers 

D. Inverse Matrices 

E. Ranks and Singularity 

F. Properties of Matrix Algebra 

G. Expressing Linear Equations in Matrix Form 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet IX. Multiple Regression Using Matrices 

#16 A. The General Linear Model 

B. OLS Using Matrices 

C. The Matrix of Residuals 

D. OLS Assumptions in Matrix Form 

E. Advantages of Using the Matrix Notation 

F. Variance-Covariance Matrices 

G. Deriving “b” and Var(b) 

H. R-Squared, Adjusted R-Squared, and SER in Matrix Form 

I. Ordinal and Dummy Variables: An Example Using Matrices 

J. Multicollinearity and Singular X’X Matrix 

K. Transformations and Interactions in Matrix Form: OLS is Still Linear… 

L. Appendix: A Proof that the Matrix Derivation of “B” Minimizes the SSE 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet X. Heteroscedasticity and Generalized Least Squares (GLS) 

#17 A. Definition 

B. Omega Matrices 

C. Variance-Covariance Matrices 

D. Types of Heteroscedasticity 

E. When to Expect Heteroscedasticity

F. Consequences of Using OLS With Heteroscedastic Errors 

G. Detection of Heteroscedasticity 

1. Diagnostic Plots 

a. What to Look For 

b. Transforming the Parameters 

2. Goldfeld-Quandt Test 

3. Glejser Test 

4. Likelihood Ratio Test 

H. Correcting for Heteroscedasticity Using Generalized Least Squares (GLS) 

1. Introduction to GLS 

2. Deriving “b” using GLS 

3. Advantages of Using GLS With Heteroscedastic Errors 

4. Conditions Under Which GLS is Equivalent to OLS 

5. GLS as Weighted Least Squares (WLS) 

6. The Underlying Logic of WLS: An Example Using State-Level Data 

7. Using GLS Given a Model for Var(e): The General Case 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet XI. Autocorrelation 

#18 A. Definition of First-Order Autocorrelation 

B. When to Expect Autocorrelation 

C. Algebraic Interpretation of First-Order Autocorrelation 

D. Examples of Positive and Negative First-Order Autocorrelation 

E. Four Common Types of Time Series Models 

F. Consequences of Using OLS With Autocorrelation 

G. Detection of Autocorrelation 

1. Diagnostic Plots 

2. The Durbin-Watson Statistic (First-Order Autocorrelation) 

a. Procedure, Derivation, and Explanation 

b. Things to Keep in Mind 

3. The Wallis Test (Fourth-Order Autocorrelation) 

H. Correcting for First-Order Autocorrelation 

1. The Prais-Winsten GLS Estimator 

a. Procedure 

b. Proof 

2. Other GLS Methods 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet XII. Dichotomous Dependent Variables 

#19 A. Examples 

B. The Binary Choice Model 

1. Description and Illustration 

2. Problems Using OLS With a Dichotomous Dependent Variable 

a. Over- or Under-Estimating Y 

b. Var(e), Heteroscedasticity, and the Goldberger Procedure 

c. Probable Non-Linearity 

3. Using an S-Shaped Curve Instead of a Line 

a. Description and Theory 

b. Fit and Bias

C. The Logit Model 

1. The Logistic Function and Log-Odds 

2. Interpreting Logit Coefficients 

3. Presenting the Results of Logit Analysis: An Example 

4. Estimating Coefficients: Maximum Likelihood Estimation 

5. Significance of Coefficients: The Likelihood Ratio Test 

6. Analogies to OLS 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Packet 

D. The Probit Model 

#20 1. N(0,1), the Normal (Gaussian) CDF, and the Probit Function 

2. Interpreting and Presenting Probit Coefficients 

3. Maximum Likelihood Estimation of Probit Models 

E. Deciding Between Using Logit and Probit 

XIII. Chart: Some Potential Problems When Using OLS Regression Models 

XIV. Final Remarks 

A. Multiple Equations: Simultaneous Equations Models 

B. Additional Topics for Further Study 

C. The Importance of Parsimony and Presentation 

D. The Dangers of Over-Reliance on Statistical Procedures 

=============================================================================================== 

Some Suggested Readings 

As mentioned earlier, the “Lecture Transcripts” that I wrote and we use in each class serve as the 

functional equivalence of a textbook for this course, and the preceding outline serves as its “Table of 

Contents” and “Index.” The additional readings in this section of the syllabus (details of which can be 

found in the “Bibliography” section that immediately follows) are 100% optional for you. 

Several of these readings are from the following four “traditional” textbooks, each of which I like a lot. 

While there are great similarities between them, I think that each takes a somewhat different (though at 

times only subtly different...) pedagogical approach. 

• Gujarati takes more of a “Math Language” (though certainly not purely so...) approach. 

• Hamilton takes more of a “Picture Language” (i.e., lots of graphics, etc.) approach. 

• Kennedy takes more of an “English Language” (i.e., explanations using verbiage) approach. 

• Wooldridge takes more of a “Combined” (and somewhat more introductory) approach. 

You will also notice several “little green books” from the Sage Series on Quantitative Applications in the 

Social Sciences. 

Finally, I have included several articles from a number of journals across several academic disciplines — 

some that are older “classics” and some others that are newer and more recently published.

Lecture Packet 

Packets #1 and #2 

Some Suggested OPTIONAL Readings (In Alphabetical Order) 

Gujarati: Appendix A 

Hamilton: 289-296 

Mohr 

Wooldridge: Appendix A.1-A.4, B, C 

Packet #3 Gujarati: Introduction; Ch. 1-2; Sec. 5.1-5.3, 5.5-5.8 

Hamilton: 29-34, 42-49, 296-297 

Kennedy: Sec. 1.1; Appendix A 

Lewis-Beck: 9-20, 26-38 

Schroeder, Sjoquist, and Stephan: 11-23, 81-82 

Wooldridge: Ch. 1-2 

Packet #4 Achen (1982): 73-77 

Achen (1991) 

Gujarati: Ch. 3-4; Sec. 5.9, 6.1-6.3 

Hamilton: 37-41, 49-51, 124-125 

Kennedy: Sec. 1.2 

King (1986) 

King (1990) 

Lewis-Beck: 20-25 

Lewis-Beck and Skalaban 


Packet #5 Hamilton: 53-58, 148 

Packet #6 Goertzel (2004) 

Gujarati: Sec. 5.10-5.13 

Hamilton: 34-37, 41-42, 51-53, 58-59 

Kennedy: Sec. 2.10 


Packet #7 Achen (1982): 7-51 

Asher: 237-248 

Berry: 1-24, 81-83 

Berry and Feldman: 9-15 

Fox: 3-9 

Gujarati: Sec. 7.1-7.4, 8.1-8.4 

Hamilton: 65-72, 109-113 

Kennedy: Sec. 1.3-1.4, 2.1-2.3, 4.1-4.2; Ch. 3 

Lewis-Beck: 47-52, 53-54 

Schroeder, Sjoquist, and Stephan: 29-31 

Wooldridge: Ch. 3, 5; Sec. 4.1-4.3



Packet #8 Berry: 24-27 


Brambor, Clark, and Golder 

Braumoeller 

Davenport, Armstrong, and Lichbach 

Friedrich 

Gujarati: Sec. 7.5, 7.6, 7.8 

Hamilton: 77-80, 84-85 

Kennedy: Sec. 2.4-2.8, 2.11 

Lewis-Beck: 52-53, 54-56 

Schroeder, Sjoquist, and Stephan: 32-51, 54-56, 58-59 

Wooldridge: Sec. 6.3 

Packet #9 Asher: 248-250 

Berry: 45-47 

Berry and Feldman: 37-50, 64-70 

Fox: 10-21 

Gujarati: Sec. 9.1-9.4, 9.6, 9.12; Ch. 10 

Hamilton: 82, 85-92, 133-136 

Hardy: 1-21, 29-48 

Kennedy: Ch. 11, 14 

Lewis-Beck: 58-63, 66-71 


Wooldridge: Sec. 7.1-7.3; 244-249 

Packet #10 Achen (1982): 51-73; 77-79 

Asher: 250-255 

Berry: 27-41, 60-66 

Berry and Feldman: 18-37, 71-72 

Fox: 53-61 

Gujarati: Sec. 7.7, 7.10, 7.12, 13.1-13.5, 13.12, 13.13; 536-537 

Halvorsen and Palmquist 

Hamilton: 72-77, 82-84, 148-153, 163-167, 173-174 

Kennedy: Sec. 9.3; Ch. 5, 6, 7, 12 

Lewis-Beck: 56-58, 63-66 

Schroeder, Sjoquist, and Stephan: 53, 59-61, 65-71 

Wooldridge: Sec. 6.1, 6.2, 9.3; 304-307, 325-328 

Packet #11 Achen (2002) 

Achen (2005) 

Berry: 41-45, 49-60 

Eisenberg and Wells 




Wooldridge: Sec. 4.6, 6.4


Packet #12 


Bollen and Jackman 

Chatterjee and Wiseman 

Fox: 21-40 

Gujarati: 540-542 

Hamilton: Ch. 1; 113-117, 125-133, 141 (note #4), 154-158 

Kennedy: Sec. 20.1-20.2 

Wooldridge: 328-333 

Packet #13 Berry and Feldman: 17-18 

Gujarati: Sec. 8.5 



Mock and Weisberg 


Wooldridge: 160-161 

Packet #14 Fox: 61-65 

Gujarati: Sec. 8.6-8.8, 8.12, 9.5, 9.8, 13.7 

Hamilton: 80-82, 101 

Hardy: 21-29, 48-53, 78-82, 84-85 

McDaniel 

Wooldridge: Ch. 19; Sec. 4.4; 150-160, 161-162, 249-252 

Packet #15 

Packet #16 

Gujarati: Appendix B 

Hamilton: 333-337, 344-345 

Namboodiri: 7-55 

Wooldridge: Appendix D 

Achen (1982): Appendix 

Fox: 80-82, 83-85 

Gujarati: Sec. 8.9; Appendix C.1-C.10, C.12 


Wooldridge: Appendix E 

Packet #17 Berry: 67, 72-78 


Fox: 49-53, 87-89 

Gujarati: Ch. 11; Appendix C.11 


Hardy: 53-61 



Wooldridge: Sec. 8.1-8.4



Packet #18 Berry: 67-72, 78-81 

Gujarati: Ch. 12 


Hardy: 82-84 

Kennedy: Sec. 8.4, 9.4 


Wooldridge: Ch. 10, 12; Sec. 11.5 

Packet #19 

Ai and Norton 

Berry: 47-49 

Gujarati: Sec. 15.1-15.8 

Hamilton: Ch. 7 

Hardy: 75-78 

Kennedy: Sec. 2.9, 15.1 

Markus: 555-556 

Pampel: Ch. 1-3; Appendix 


Wooldridge: Sec. 7.5, 8.5, 17.1 

Packet #20 Asher: 255-256, 260-261 


Fox: 75-80 

Goertzel (2002) 

Granato 

Gujarati: Sec. 15.9, 15.10, 15.13, 15.14; Ch. 18 

Hamilton: 246 (note #2), 249-252, 281-282 

Kennedy: Ch. 10, 21 


Pampel: Ch. 4-5 

Schroeder, Sjoquist, and Stephan: 77-79, 80 

Wooldridge: Sec. 7.6, 9.2, 15.1-15.5, 16.1-16.4 

Bibliography 

Achen, C. Interpreting and Using Regression. Sage University Paper Series on Quantitative 

Applications in the Social Sciences, #029, 1982. 

Achen, C. “What Does ‘Explained Variance’ Explain?: Reply.” In Political Analysis, Vol. 2 (1990). The 

University of Michigan Press, 1991. Pages 173-184. 

Achen, C. “Toward a New Political Methodology: Microfoundations and ART.” In Annual Review of 

Political Science, Vol. 5 (2002). Pages 423-450. 

Achen, C. “Let’s Put Garbage-Can Regressions and Garbage-Can Probits Where They Belong.” Conflict 

Management and Peace Science 22:4 (January 2005), pages 327-339.

Ai, C. and Norton, E. C. “Interaction Terms in Logit and Probit Models.” Economics Letters 80 (2003), 

pages 123-129. 

Asher, H. B. “Regression Analysis.” In Asher, H. B., Weisberg, H. F., Kessel, J. H., and Shively, W. P., 

Theory-Building and Data Analysis in the Social Sciences. University of Tennessee Press, 1984. 

Pages 237-261. 

Berry, W. Understanding Regression Assumptions. Sage University Paper Series on Quantitative 


Berry, W. and Feldman, S. Multiple Regression in Practice. Sage University Paper Series on 

Quantitative Applications in the Social Sciences, #050, 1985. 

Bollen, K. and Jackman, R. “Regression Diagnostics: An Expository Treatment of Outliers and 

Influential Cases.” In Fox, J. and Long, J. S., eds., Modern Methods of Data Analysis. Sage 

Publications, 1990, pages 257-291. 

Brambor, T., Clark, W. R., and Golder, M. “Understanding Interaction Models: Improving Empirical 

Analysis.” Political Analysis 14(1) (2006), pages 63-82. 

Braumoeller, B. F. “Hypothesis Testing and Multiplicative Interaction Terms.” International 

Organization 58:4 (Fall 2004), pages 807-820. 

Chatterjee, S. and Wiseman, F. “Use of Regression Diagnostics in Political Science Research.” 

American Journal of Political Science 27 (August 1983), pages 601-613. 

Davenport, C., Armstrong, D., and Lichbach, M. “Conflict Escalation and the Origins of Civil War.” 

Paper presented at the annual meeting of the Midwest Political Science Association, Chicago, 

Illinois, USA. April 7, 2005. 

Eisenberg, T. and Wells, M. “The Significant Association Between Punitive and Compensatory Damages 

in Blockbuster Cases: A Methodological Primer.” Journal of Empirical Legal Studies 3:1 (March 

2006), pages 175-195. 

Fox, J. Regression Diagnostics. Sage University Paper Series on Quantitative Applications in the Social 

Sciences, #079, 1991. 

Friedrich, R. F. “In Defense of Multiplicative Terms in Multiple Regression Equations.” American 

Journal of Political Science 26 (November 1982), pages 797-837. 

Goertzel, T. “Myths of Murder and Multiple Regression.” Skeptical Inquirer 26:1 (January/February 

2002), pages 19-23. 

Goertzel, T. “Capital Punishment and Homicide: Sociological Realities and Econometric Illusions.” 

Skeptical Inquirer 28:4 (July/August 2004), pages 23-27. 

Granato, J. “An Agenda for Econometric Model Building.” In Political Analysis, Vol. 3 (1991). The 

University of Michigan Press, 1992. Pages 123-154. 

Gujarati, D. Basic Econometrics, Fourth Edition. McGraw-Hill, 2003.

Halvorsen, R., and Palmquist, R. “The Interpretation of Dummy Variables in Semilogarithmic 

Equations.” The American Economic Review 70:3 (June, 1980), pages 474-475. 

Hamilton, L. Regression with Graphics: A Second Course in Applied Statistics. Duxbury, 1992. 

(Note: This is the same as the edition published by Brooks Cole in 1999.) 

Hardy, M. A. Regression With Dummy Variables. Sage University Paper Series on Quantitative 


Kennedy, P. A Guide to Econometrics, Fifth Edition. MIT Press, 2003. 

King, G. “How Not to Lie With Statistics: Avoiding Common Mistakes in Quantitative Political 

Science.” American Journal of Political Science 30 (August 1986), pages 666-687. 

King, G. “Stochastic Variation: A Comment on Lewis-Beck and Skalaban’s ‘The R-Squared’.” In 

Political Analysis, Vol. 2 (1990). The Univ. of Michigan Press, 1991. Pages 185-200. 

Lewis-Beck, M. Applied Regression: An Introduction. Sage University Paper Series on Quantitative 


Lewis-Beck, M. and Skalaban, A. “The R-Squared: Some Straight Talk.” In Political Analysis, Vol. 2 

(1990). Ann Arbor: The University of Michigan Press, 1991. Pages 153-171. 

Markus, G. B. “Political Attitudes during an Election Year: A Report on the 1980 NES Panel Study.” 

American Political Science Review 76 (September 1982), pages 538-560. 

McDaniel, T. “Categorical Independent Variables in Ordinary Least Squares Regression.” 1996. 

(Note: A copy of this paper is included in the class notes.) 

Mock, C. and Weisberg, H. “Political Innumeracy: Encounters With Coincidence, Improbability, and 

Chance.” American Journal of Political Science 36 (Nov. 1992), pages 1023-1046. 

Mohr, L. Understanding Significance Testing. Sage University Paper Series on Quantitative 


Namboodiri, K. Matrix Algebra: An Introduction. Sage University Paper Series on Quantitative 


Pampel, F. C. Logistic Regression: A Primer. Sage University Paper Series on Quantitative Applications 

in the Social Sciences, #132, 2000. 

Schroeder, L. D., Sjoquist, D. L., and Stephan, P. E. Understanding Regression Analysis. Sage 

University Paper Series on Quantitative Applications in the Social Sciences, #057, 1986. 

Wooldridge, J. M. Introductory Econometrics: A Modern Approach, Third Edition. Thompson/South- 

Western, 2006.

REGRESSION ANALYSIS II - icpsr - University of Michigan

Create successful ePaper yourself

Delete template?

Save as template?