29.12.2013 Views

Chapter 20 Generalized Method of Moments Estimators and ...

Chapter 20 Generalized Method of Moments Estimators and ...

Chapter 20 Generalized Method of Moments Estimators and ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Web Extension 10<br />

<strong>Chapter</strong> <strong>20</strong><br />

<strong>Generalized</strong> <strong>Method</strong> <strong>of</strong> <strong>Moments</strong><br />

<strong>Estimators</strong> <strong>and</strong> Identification<br />

“A foolish consistency is the hobgoblin <strong>of</strong> little minds, adored by little<br />

statesmen <strong>and</strong> philosophers <strong>and</strong> divines.” 1<br />

—RALPH WALDO EMERSON (1803–1882)<br />

Despite Emerson’s contempt for foolish consistency, consistency<br />

is not a foolish property for an estimator. Indeed, its<br />

absence would seem troubling. Who would want to admit<br />

that his or her estimator wouldn’t get the right answer<br />

with “all the data in the world”? Consequently, a widely<br />

applicable method for devising consistent estimators is a useful tool. This<br />

section develops two such tools, the method <strong>of</strong> moments <strong>and</strong> the<br />

generalized method <strong>of</strong> moments; these are the most recently popularized estimation<br />

techniques presented in this book. In addition to its usefulness in<br />

applications, method <strong>of</strong> moments estimation provides a natural bridge to a<br />

fundamental concept in econometrics, identification, which we’ll study in<br />

the next section.<br />

<strong>20</strong>.1 <strong>Method</strong> <strong>of</strong> <strong>Moments</strong> <strong>Estimators</strong><br />

The ungeneralized method <strong>of</strong> moments harkens back to<br />

<strong>Chapter</strong> 2, where we sought estimators for a straight line<br />

through the origin. How were we to construct estimators<br />

from a sample <strong>of</strong> data? The method <strong>of</strong> moments <strong>of</strong>fers a<br />

straightforward strategy for devising estimators that prove<br />

consistent under very general assumptions. Just as our intuition<br />

led to several estimators in <strong>Chapter</strong> 2, the method <strong>of</strong><br />

moments may also lead to multiple estimators. In such cases,<br />

we turn to the generalized method <strong>of</strong> moments to settle on a<br />

single consistent estimator.<br />

EXT 10-1


EXT 10-2 Web Extension 10<br />

<strong>Moments</strong><br />

Statisticians call the expected values <strong>of</strong> variables or <strong>of</strong> products <strong>of</strong> variables moments.<br />

E(X , <strong>and</strong> E(X i e 2 i ), E(e i ), E(e 2 i ), E(X i e i )<br />

i ) are all examples <strong>of</strong> moments. We<br />

commonly make assumptions about moments in our data-generating processes.<br />

For example, we might assume<br />

<strong>and</strong><br />

Y i = b 0 + b 1 X i + e i ,<br />

E(e i ) = 0,<br />

E(X i e i ) = 0,<br />

as is true under the Gauss–Markov Assumptions for a straight line with unknown<br />

slope <strong>and</strong> intercept. (Notice that in this example, the explanators need not be<br />

fixed across samples. The assumption that E(X i e i ) = 0 is weaker than the assumption<br />

that the X i are fixed across samples.)<br />

The Ungeneralized <strong>Method</strong> <strong>of</strong> <strong>Moments</strong><br />

<strong>Method</strong> <strong>of</strong> moments estimation devises an estimator by insisting that a moment<br />

expectation that is true in the population holds true exactly for the corresponding<br />

moment within a given sample. That is, the method <strong>of</strong> moments insists that something<br />

true on average <strong>of</strong> the disturbances in the population be true on average<br />

about the residuals in any one sample. In our example, the mean residual in the<br />

sample is forced to be zero, or the covariance <strong>of</strong> the X’s <strong>and</strong> the residuals in the<br />

sample is forced to be zero, analogously to the expected value <strong>of</strong> the disturbances<br />

in the population or the covariance <strong>of</strong> X <strong>and</strong> the disturbances in the population<br />

both being zero. The method <strong>of</strong> moments estimators for b <strong>and</strong> , b ~ <strong>and</strong> b ~ 0 b 1 0 1, in<br />

our example are, therefore, found by solving<br />

1<br />

n a e ~ i = 1 n a (Y i - b ~ 0 - b ~ 1X i ) = 0<br />

<strong>20</strong>.1<br />

<strong>and</strong><br />

1<br />

n a X ie ~ i = 1 n a X i(Y i - b ~ 0 - b ~ 1X i ) = 0,<br />

<strong>20</strong>.2<br />

the within-sample versions <strong>of</strong> the moment conditions assumed in the DGP. The e~<br />

are the residuals obtained using the estimators b ~ <strong>and</strong> b ~ i<br />

0 1. In this example, these<br />

relationships happen to be the same relationships required to minimize the sum <strong>of</strong><br />

squared residuals, as we learned in <strong>Chapter</strong> 5. Solving these equations for b ~ 0 <strong>and</strong>


<strong>Generalized</strong> <strong>Method</strong> <strong>of</strong> <strong>Moments</strong> <strong>Estimators</strong> <strong>and</strong> Identification EXT 10-3<br />

b ~ 1,<br />

therefore, leads to bN 0 <strong>and</strong> bN 1 . The method <strong>of</strong> moments estimators in this example<br />

coincide with the ordinary least squares (OLS) estimators, bN 0 <strong>and</strong> bN 1 .<br />

The method <strong>of</strong> moments estimators are consistent. An intuition for their consistency<br />

is that the Law <strong>of</strong> Large Numbers says that under suitable conditions<br />

Equations <strong>20</strong>.1 <strong>and</strong> <strong>20</strong>.2 are almost surely very close to correct if b ~ <strong>and</strong> b ~ 0 1 are<br />

replaced by b 0 <strong>and</strong> b 1 . Thus, if the DGP satisfies the conditions for the Law <strong>of</strong><br />

Large Numbers, the method <strong>of</strong> moments estimators tend to coincide with the true<br />

parameter values as the sample size grows without bound. Appendix <strong>20</strong>.1 more<br />

formally shows the consistency <strong>of</strong> the method <strong>of</strong> moments estimator in this<br />

model. In general, method <strong>of</strong> moments estimators are consistent whenever the<br />

Law <strong>of</strong> Large Numbers ensures that the sample moments in the DGP converge in<br />

probability to the corresponding population moments.<br />

The <strong>Generalized</strong> <strong>Method</strong> <strong>of</strong> <strong>Moments</strong><br />

Our method <strong>of</strong> moments estimation <strong>of</strong> a line with unknown intercept <strong>and</strong> slope<br />

exploited two moment conditions embedded in the Gauss–Markov Assumptions.<br />

But what would we do to estimate the slope <strong>of</strong> a line through the origin? The two<br />

moment restrictions are still E(e i ) = 0 <strong>and</strong> E(X i e i ) = 0, but because there is only<br />

one parameter to estimate, the slope, b, the method <strong>of</strong> moments provides a surfeit<br />

<strong>of</strong> riches. It tells us to devise an estimate <strong>of</strong> b from<br />

1<br />

n a e ~ i = 1 n a (Y i - b ~ X i ) = 0<br />

<strong>and</strong><br />

1<br />

n a X i e ~ i = 1 n a X i(Y i - b ~ X i ) = 0.<br />

But with two equations <strong>and</strong> only one unknown, solving both <strong>of</strong> these equations<br />

for b ~ b ~ ,<br />

yields two estimators, not one. The first equation yields<br />

b ~ =<br />

1<br />

n a Y i<br />

1<br />

n a X i<br />

= aY i<br />

a X i<br />

= b g2 ,<br />

whereas the second equation yields<br />

b ~ =<br />

1<br />

n a X iY i<br />

1<br />

n a X i 2<br />

= aX iY i<br />

a X i 2 = b g4 .


EXT 10-4 Web Extension 10<br />

b g2<br />

b g4<br />

Both <strong>and</strong> are method <strong>of</strong> moments estimators <strong>of</strong> the slope <strong>of</strong> a line<br />

through the origin under the Gauss–Markov Assumptions (or in any DGP in<br />

which E(e i ) = 0 <strong>and</strong> E(X i e i ) = 0, including some in which the explanators are<br />

not fixed). Notice that in satisfying one <strong>of</strong> two moment restrictions, neither b g2<br />

nor b g4 satisfies both moment restrictions. When b g2 is the estimator, the mean<br />

1<br />

residual is zero, so n g e~ i matches its corresponding population moment. But when<br />

1<br />

b is the estimator, n gX ~ g2 i e i does not equal zero (except by occasional accident),<br />

so this second sample moment does not equal its population counterpart. When<br />

1<br />

1<br />

b is the estimator, n gX ie ~ g4<br />

i equals zero, but n g e~ i does not (except by occasional<br />

accident). Except in the accidental case in which b g2 <strong>and</strong> b g4 are equal, neither<br />

method <strong>of</strong> moments estimator makes both sample moments equal to their population<br />

counterparts.<br />

The generalized method <strong>of</strong> moments provides an estimation strategy when<br />

the number <strong>of</strong> restricted moments in the DGP exceeds the number <strong>of</strong> parameters<br />

to be estimated. Rather than satisfy one moment condition <strong>and</strong> violate another,<br />

the generalized method <strong>of</strong> moments (GMM) strategy chooses an estimator that<br />

balances each population moment condition against the others, seeking residuals<br />

that trade <strong>of</strong>f violations <strong>of</strong> one moment restriction against violations <strong>of</strong> the other<br />

moment restrictions. A GMM estimator may satisfy no one moment condition,<br />

but it may come close to satisfying them all.<br />

In the example <strong>of</strong> a line through the origin, the BLUE property <strong>of</strong> b g4 makes<br />

it clear which is the most preferred method <strong>of</strong> moments estimator. We should<br />

choose our estimator so that<br />

<strong>and</strong> ignore the restriction that<br />

1<br />

n a X ie ~ i = 1 n a X i(Y i - b ~ X i ) = 0<br />

1 ~ n a e i = 1 n a (Y i - b ~ X i ) = 0.<br />

This strategy yields b g4 , which we know to be BLUE. With more complex DGPs,<br />

the optimal choice is not so obvious. When the choice among method <strong>of</strong><br />

moments estimators is not clear, GMM <strong>of</strong>fers a strategy for devising a single estimator.<br />

GMM does not require that the sample moments equal the population moments.<br />

For example, in the simple case <strong>of</strong> a line through the origin with two moment<br />

restrictions, the GMM estimator does not insist that<br />

1<br />

n a e~ i = 1 n a (Y i - b ~ X i ) = 0


<strong>Generalized</strong> <strong>Method</strong> <strong>of</strong> <strong>Moments</strong> <strong>Estimators</strong> <strong>and</strong> Identification EXT 10-5<br />

<strong>and</strong><br />

1<br />

n a X ie ~ i = 1 n a X i(Y i - b ~ X i ) = 0.<br />

Such insistence would be futile—generally, no estimator can satisfy both restrictions.<br />

Nor does GMM insist that one sample moment or the other equal zero. Instead,<br />

GMM looks at how much an estimator makes the sample moments differ<br />

from their population counterparts, as in<br />

1<br />

n a e * i = 1 n a (Y i - b * X i ) = n 1<br />

<strong>and</strong><br />

1<br />

n a X i e i * = n 1 a X i(Y i - b * X i ) = n 2 ,<br />

where n 1 <strong>and</strong> n 2 are the amounts by which the first <strong>and</strong> second moment restrictions<br />

are violated by the residuals implied by b * . One estimation strategy that<br />

would yield consistent estimators would be to minimize (n 2 1 + n 2 ). This strategy<br />

would aim to make as small as possible the squared deviations <strong>of</strong> the sample moments<br />

from the their population analogs. This strategy shares an intuitive foundation<br />

with ordinary least squares, but here the goal is to make small not the<br />

squared residuals, but the squared deviations from the population moment restrictions.<br />

This reasonable approach is not the one followed by GMM.<br />

GMM modifies the strategy <strong>of</strong> minimizing (n 2 1 + n 2 ) much as generalized<br />

least squares modifies ordinary least squares. Instead <strong>of</strong> minimizing the unweighted<br />

sum <strong>of</strong> squared deviations, (n 2 1 + n 2 ), GMM minimizes a weighted sum<br />

<strong>of</strong> the squared deviations, in which the weights reflect the variances <strong>and</strong> covariances<br />

<strong>of</strong> the n i . GMM does not guarantee an efficient estimator, but it does provide<br />

a consistent estimator, <strong>and</strong> its weighting scheme is more efficient than the<br />

simpler unweighted scheme. GMM provides a powerful tool for finding consistent<br />

estimators in models that are otherwise mathematically quite cumbersome.<br />

<strong>20</strong>.2 Identification<br />

WHAT IS THE DGP?<br />

We have just seen that when the DGP has more moment conditions than parameters<br />

to be estimated, we have a surfeit <strong>of</strong> riches. The problem isn’t finding a consistent<br />

estimator, but choosing among several method <strong>of</strong> moments estimators. But<br />

what if there are fewer moment restrictions than parameters to be estimated?<br />

When there are too few restrictions in the DGP to allow consistent estimation <strong>of</strong>


EXT 10-6 Web Extension 10<br />

some or all parameters, we say the parameters are underidentified. When there<br />

are more restrictions than necessary to estimate the parameters consistently, we<br />

say the parameters are overidentified. Underidentified parameters cannot be estimated<br />

consistently.<br />

Underidentified Parameters<br />

The following example helps us underst<strong>and</strong> underidentification. Suppose that in<br />

the DGP for a straight line with an unknown intercept term, the X-values are not<br />

fixed across samples, but rather X is a r<strong>and</strong>om variable. If X is a r<strong>and</strong>om variable,<br />

it might be correlated with the disturbance term. For example, in a wage equation<br />

in which education <strong>and</strong> experience are the only explanators, the disturbance contains<br />

all other influences on wages, such as punctuality, diligence, <strong>and</strong> native intelligence.<br />

If the education one attains is correlated with those same traits, then our<br />

explanator education is correlated with the disturbances. In general, if the explanators<br />

are correlated with the disturbances, we cannot say that E(X i e i ) = 0.<br />

In<br />

such a case, we have only one moment condition, E(e i ) = 0, but two parameters<br />

to estimate.<br />

With only one moment condition <strong>and</strong> two parameters to estimate, we can<br />

still choose estimates <strong>of</strong> b 0 <strong>and</strong> b 1 that make the population moment condition<br />

true in our sample:<br />

1<br />

n a (e i) = 1 n a (Y i - b ~ 0 - b ~ 1X i ) = 0.<br />

But with one equation <strong>and</strong> two unknowns, we can choose any value for <strong>and</strong><br />

then compute the value for b ~ b ~ 0<br />

1 that makes the moment condition true. Conversely,<br />

we could choose any value for b ~ <strong>and</strong> then compute the value for b ~ 1<br />

0 that makes<br />

the moment condition true. Such arbitrary parameter estimates do not have the<br />

property <strong>of</strong> consistency. 2 When the DGP <strong>of</strong>fers too few restrictions to pin down<br />

the parameters <strong>of</strong> interest, we say the relationship’s parameters are underidentified.<br />

Exactly Identified Parameters<br />

In contrast to the DGP with unknown slope <strong>and</strong> intercept, in the DGP for a<br />

straight line through the origin, ab<strong>and</strong>oning the population restriction that<br />

E(X i e i ) = 0 does not pose a problem for consistent estimation. With only one parameter<br />

to be estimated, the one moment restriction, E(e i ) = 0, provides a basis<br />

for consistent estimation. Indeed, in this DGP, ab<strong>and</strong>oning the assumption that<br />

E(X i e i ) = 0 relieves the surfeit <strong>of</strong> riches that plagued us earlier. With one restriction<br />

<strong>and</strong> one parameter, there will be only one method <strong>of</strong> moments estimator for


<strong>Generalized</strong> <strong>Method</strong> <strong>of</strong> <strong>Moments</strong> <strong>Estimators</strong> <strong>and</strong> Identification EXT 10-7<br />

the slope <strong>of</strong> the line through the origin, . In a classic application, presented in<br />

<strong>Chapter</strong> 2, Milton Friedman estimated the marginal propensity to consume from<br />

permanent income using because he believed that E(e i ) = 0 was true in his<br />

model <strong>and</strong> that E(X i e i = 0) was not. When the restrictions in the DGP imply a<br />

single GMM estimator for each parameter, we say the parameters <strong>of</strong> the relationship<br />

are exactly identified, or just identified.<br />

b g2<br />

b g2<br />

(Text continues on page EXT 10-10)<br />

Overidentified Parameters<br />

Exact identification permits consistent estimation. Underidentification makes consistent<br />

estimation impossible. What about overidentification? When the restrictions<br />

in the DGP yield a surfeit <strong>of</strong> riches, with more restrictions than parameters to be estimated,<br />

we say the parameters <strong>of</strong> the relationship are overidentified. At first look,<br />

overidentification <strong>and</strong> underidentification seem similar in their consequences. In<br />

neither case do the moment restrictions <strong>of</strong> the DGP yield unique estimators <strong>of</strong> the<br />

relationship’s parameters. However, more deeply, the two phenomena are dramatically<br />

different. The several (or many) estimators <strong>of</strong>fered by overidentification will<br />

all converge in probability to the same result; all the estimators are consistent. The<br />

multitude <strong>of</strong> estimates possible with underidentification, however, remain arbitrary<br />

<strong>and</strong> in conflict with one another, even in infinite samples.<br />

In the face <strong>of</strong> overidentification, we are pressed to choose among the multitude<br />

<strong>of</strong> consistent estimators. GMM is one common strategy for settling on one<br />

consistent estimator from among many. GMM is not generally asymptotically efficient,<br />

however. Later in this extension, we encounter an efficient estimator,<br />

called the maximum likelihood estimator, which, when it is applicable, is preferable<br />

to GMM when estimating overidentified relationships. The problem posed<br />

by overidentification is modest. We can proceed in the face <strong>of</strong> overidentification<br />

confident that all consistent estimators give similar results in very large samples.<br />

In contrast, in the face <strong>of</strong> underidentification, we cannot consistently estimate the<br />

parameters <strong>of</strong> interest at all.<br />

Exclusion <strong>and</strong> Covariance Restrictions<br />

The examples <strong>of</strong> a straight line through the origin <strong>and</strong> a straight line with unknown<br />

intercept <strong>and</strong> slope illustrate two fundamental ways in which econometricians<br />

achieve identification <strong>of</strong> parameters. In the DGP<br />

Y i = b 0 + b 1 X i + e i i = 1, Á , n<br />

<strong>and</strong><br />

E(e i ) = 0 i = 1, Á , n,


EXT 10-8 Web Extension 10<br />

An Econometric Top 40—A Pop Tune<br />

Competing in the New Economy<br />

“How do managerial decisions such as whether<br />

or not to adopt a Total Quality Management<br />

(TQM) system or to exp<strong>and</strong> an employee involvement<br />

program affect labor productivity?<br />

Does the implementation <strong>of</strong> ‘high performance’<br />

workplace practices ensure better firm performance?<br />

Does the presence <strong>of</strong> a union hinder<br />

or enhance the probability <strong>of</strong> success associated<br />

with implementing these practices? Do<br />

computers really help workers be more productive?”<br />

3 Economists S<strong>and</strong>ra Black <strong>of</strong> the Federal<br />

Reserve Bank <strong>of</strong> New York <strong>and</strong> Lisa Lynch <strong>of</strong><br />

Tufts University posed these pressing contemporary<br />

questions in <strong>20</strong>01. Their informative<br />

econometric study provides answers <strong>of</strong> interest<br />

to many corporate managers.<br />

Black <strong>and</strong> Lynch study a sample <strong>of</strong> more<br />

than 600 U.S. manufacturing establishments,<br />

each observed once per year by the Census Bureau<br />

between 1987 <strong>and</strong> 1994. In 1994 the data<br />

on workplace practices were gathered from the<br />

establishments, along with data on inputs <strong>and</strong><br />

outputs; in the earlier years, only data on inputs<br />

<strong>and</strong> outputs were gathered.<br />

The authors embedded their study in the<br />

context <strong>of</strong> a Cobb–Douglas production function<br />

for manufacturing firms:<br />

ln(Y>L) i = b 0i + b 1 ln(K>L) i<br />

+ b 2 ln(M>L) i + e i ,<br />

where Y is output, L is labor, K is capital, <strong>and</strong><br />

M is materials. Notice that this function differs<br />

from most we have seen in that the intercept<br />

varies from firm to firm. The authors augmented<br />

the Cobb–Douglas specification by assuming<br />

that worker productivity (measured by<br />

Y/L, output per worker) depends on establish-<br />

ment-specific workplace practices <strong>and</strong> worker<br />

characteristics:<br />

T<br />

b 0i = a 0 + a a j Z ji ,<br />

j = 1<br />

where Z ji is the i-th firm’s value for the j-th trait<br />

in a list <strong>of</strong> workplace practices <strong>and</strong> worker<br />

characteristics; there are T items in the list in<br />

all.<br />

Breaking the specification into two steps<br />

highlights the underlying Cobb–Douglas assumption.<br />

The specification actually reduces to<br />

a single garden variety linear regression with<br />

inputs per worker (K/L <strong>and</strong> M/L), workplace<br />

practices, <strong>and</strong> worker characteristics as explanators.<br />

What is not garden variety is the rest<br />

<strong>of</strong> the DGP. The Gauss–Markov Assumptions<br />

surely do not apply here. Each firm is observed<br />

several times; systematic differences among<br />

firms (some firms being fundamentally more or<br />

less productive than others) make it likely that<br />

the disturbances for each firm are correlated<br />

over time, even if the disturbances are independent<br />

across firms. Moreover, were the authors<br />

to gather a different sample <strong>of</strong> firms, they<br />

would surely obtain different values for the explanatory<br />

variables. The X’s are not fixed in repeated<br />

samples. And with r<strong>and</strong>om X-values<br />

comes the worry that those firms wise enough<br />

to choose beneficial X-values might also<br />

choose beneficial unmeasured practices—the<br />

inputs per worker <strong>and</strong> the workplace practices<br />

might be correlated with the disturbances. Little<br />

seems left <strong>of</strong> the Gauss–Markov Assumptions<br />

on which to build good estimators.<br />

The wholesale failure <strong>of</strong> the Gauss–Markov<br />

Assumptions poses estimation problems for


<strong>Generalized</strong> <strong>Method</strong> <strong>of</strong> <strong>Moments</strong> <strong>Estimators</strong> <strong>and</strong> Identification EXT 10-9<br />

Black <strong>and</strong> Lynch. Is consistent estimation possible<br />

at all? Are the parameters in their model<br />

identified? And if the parameters are identified,<br />

how are they estimated? Black <strong>and</strong> Lynch argue<br />

that they know enough about the variances <strong>and</strong><br />

covariances <strong>of</strong> the explanators <strong>and</strong> disturbances<br />

to identify the parameters in their<br />

model through several covariance <strong>and</strong> exclusion<br />

restrictions. Consistent estimation is at<br />

least possible. When the Gauss–Markov Assumptions<br />

fail thoroughly enough to make<br />

least squares unattractive, but we do have sufficient<br />

information about the variances <strong>and</strong> covariances<br />

<strong>of</strong> the explanators <strong>and</strong> disturbances<br />

to restrict numerous moments in the population,<br />

the generalized method <strong>of</strong> moments<br />

(GMM) <strong>of</strong>fers an appealing estimation procedure.<br />

Black <strong>and</strong> Lynch argue that they know<br />

enough about the variances <strong>and</strong> covariances <strong>of</strong><br />

the explanators <strong>and</strong> disturbances to rely on<br />

GMM to construct consistent, asymptotically<br />

normally distributed estimators <strong>of</strong> the a’s <strong>and</strong><br />

b’s. Given the large number <strong>of</strong> establishments<br />

in Black <strong>and</strong> Lynch’s sample, good asymptotic<br />

properties are an attractive basis for inference.<br />

The t-statistics <strong>and</strong> F-tests we are accustomed<br />

to analyzing are asymptotically valid here, so<br />

we can discuss the empirical results <strong>of</strong> Black<br />

<strong>and</strong> Lynch much as we would if they had used<br />

OLS under the Gauss–Markov Assumptions.<br />

What do they find? The authors report that<br />

TQM systems do not raise productivity by<br />

themselves, but that allowing employees a<br />

greater voice in decisions does improve productivity<br />

(<strong>and</strong> to the extent that TQM includes<br />

such increased voice, TQM improves productivity).<br />

Moreover, report the authors, institut-<br />

ing pr<strong>of</strong>it sharing can have a positive effect on<br />

productivity, but only when the plan includes<br />

pr<strong>of</strong>it sharing for nonmanagerial employees.<br />

The authors further find that unionized establishments<br />

that increase worker voice <strong>and</strong> add<br />

pr<strong>of</strong>it sharing that includes nonmanagerial employees<br />

get a particularly large boost in productivity<br />

from such workplace policies. In contrast,<br />

productivity in unionized establishments<br />

that do not introduce such new workplace policies<br />

lags behind that in similarly un-innovative<br />

nonunionized establishments.<br />

Black <strong>and</strong> Lynch also find that increasing<br />

use <strong>of</strong> computers can enhance productivity.<br />

Firms with higher levels <strong>of</strong> computer usage by<br />

nonmanagerial workers have higher productivity<br />

than those with less computer use.<br />

Final Notes<br />

Black <strong>and</strong> Lynch’s paper illustrates how everyday<br />

economic questions are sometimes best<br />

tackled with highly sophisticated empirical<br />

techniques. Relatively simple OLS would not<br />

reliably answer the questions Black <strong>and</strong> Lynch<br />

address. Instead, Black <strong>and</strong> Lynch combined<br />

covariance restrictions <strong>and</strong> exclusion restrictions<br />

grounded in their underst<strong>and</strong>ing <strong>of</strong> the<br />

data’s origins to provide both identification for<br />

their model <strong>and</strong> the moment restrictions necessary<br />

for conducting GMM estimation. Simple<br />

economics can require complex statistics.<br />

When the DGPs suitable for modeling realworld<br />

data depart far from the Gauss–Markov<br />

Assumptions, particularly complicated estimation<br />

strategies may be needed, rather than OLS<br />

or GLS.<br />


EXT 10-10 Web Extension 10<br />

b 1<br />

neither the slope <strong>of</strong> the line, , nor the intercept, , is identified. We do not have<br />

enough prior information about where the data come from to allow us to consistently<br />

estimate these parameters. What additional information would identify the<br />

slope? Our earlier discussion exposes two possibilities. First, we might learn that<br />

there is, in fact, no intercept term in the model; excluding the intercept from the<br />

model would identify the slope. Second, we might learn that E(X i e i ) = 0, in which<br />

case the slope (<strong>and</strong> the intercept) would be identified; restricting the covariance<br />

between the explanators <strong>and</strong> disturbances to be zero would identify the slope <strong>of</strong><br />

the line. Exclusion restrictions <strong>and</strong> covariance restrictions are not the only ways a<br />

model’s parameters become identified, but they are common strategies.<br />

Instrumental variables (IV) estimation relies on exclusion <strong>and</strong> covariance restrictions<br />

for its consistency. When the covariance restriction that E(X i e i ) = 0<br />

fails, IV can nonetheless consistently estimate the parameters <strong>of</strong> an equation, but<br />

only if a combination <strong>of</strong> exclusion restrictions <strong>and</strong> covariance restrictions hold. IV<br />

estimation requires that we know that E(Z i e i ) = 0 <strong>and</strong> E(Z i X i ) Z 0, which are covariance<br />

restrictions, <strong>and</strong> that the potential instrument, Z, isnot itself a relevant<br />

explanator <strong>of</strong> the dependent variable, which is an exclusion restriction.<br />

It is important to note that we ought not impose identifying restrictions arbitrarily.<br />

False restrictions misspecify the DGP <strong>and</strong> undermine both consistent estimation<br />

<strong>and</strong> valid statistical inference. Econometricians look to make plausible<br />

identifying assumptions <strong>and</strong>, when possible, to test whether the data support the<br />

identifying assumptions.<br />

<strong>20</strong>.3 An Application: Military Service <strong>and</strong> Wages<br />

Underidentification makes econometric analysis futile. Consistent estimators <strong>of</strong><br />

an underidentified parameter do not exist. Essential to econometric success is<br />

identifying the parameters <strong>of</strong> interest, <strong>and</strong> to identify parameters requires that our<br />

DGP contain sufficient assumptions. Here we see how underidentification can<br />

threaten a specific empirical project <strong>and</strong> how an econometrician can use common<br />

sense <strong>and</strong> economic reasoning to impose assumptions on a DGP sufficient to<br />

identify parameters <strong>of</strong> interest.<br />

Let’s begin with the question “Does military service enhance an individual’s<br />

future earning power?” Many people have thought it does. Unfortunately, the hypothesis<br />

long proved difficult to test. It might seem simple to specify:<br />

W i = b 0 + b 1 M i + e i ,<br />

in which W is the person’s wage <strong>and</strong> M is a dummy variable indicating past military<br />

service, <strong>and</strong> to use OLS to estimate b 1 , the effect <strong>of</strong> past military service on<br />

wages. Unfortunately, OLS is not a consistent estimator <strong>of</strong> b 1 because E(M i e i )<br />

usually does not equal zero; b 1 is underidentified. Why doesn’t E(M i e i ) equal<br />

b 0


<strong>Generalized</strong> <strong>Method</strong> <strong>of</strong> <strong>Moments</strong> <strong>Estimators</strong> <strong>and</strong> Identification EXT 10-11<br />

zero? Because individuals who join the military <strong>of</strong>ten differ from other people in<br />

traits that influence wages, but that the econometrician is unlikely to observe,<br />

such as self-discipline <strong>and</strong> self-confidence. Indeed, people <strong>of</strong>ten join the military<br />

with the specific intention <strong>of</strong> acquiring such traits. We can control for traits such<br />

as education, work experience, <strong>and</strong> gender, but some very personal characteristics<br />

that influence decisions to join the military <strong>and</strong> also influence wage prospects will<br />

not be measured <strong>and</strong> included in our data sets. These unmeasured traits are part<br />

<strong>of</strong> the disturbance term, <strong>and</strong> they may be different for people who serve <strong>and</strong> people<br />

who do not, so E(M i e i ) may not equal zero. Military service may give illdisciplined<br />

enlistees more discipline, but they may still be undisciplined enough<br />

that they suffer lower wages than otherwise similar workers. If enlistees have unmeasured<br />

traits that detract from their labor-market prospects, <strong>and</strong> if military<br />

service does not fully overcome those disadvantages, the estimated coefficient on<br />

past military service will be negative, despite what positive effect that service may<br />

have on those individuals’ earnings potential. Without E(M i e i ) = 0, we have only<br />

E(e i ) = 0 with which to estimate b 0 <strong>and</strong> b 1 . The coefficients are not identified.<br />

Economist Josh Angrist <strong>of</strong> MIT explored how we might identify the effect <strong>of</strong><br />

military service on wages. 4 He decided that data from a draft lottery that took<br />

place during the Vietnam conflict would allow identification <strong>of</strong> the effect <strong>of</strong> military<br />

service on earnings. After carefully pondering the draft lottery, Angrist posed<br />

a DGP for the wages <strong>of</strong> workers who had been subject to that lottery. He posited<br />

two covariance restrictions <strong>and</strong> an exclusion restriction to identify b 1 . In the draft<br />

lottery, inductees were selected according to their birth dates. A r<strong>and</strong>om draw <strong>of</strong><br />

birth dates determined which birth dates would be drafted. It was a lottery few<br />

wanted to win. Because birth dates were unlikely to be correlated with individual’s<br />

productivity characteristics, such as self-discipline or self-confidence, entering<br />

the military through the draft was unlikely to be correlated with those traits.<br />

Angrist defined a dummy variable L to indicate that a person’s birthday was a<br />

lottery date. Angrist plausibly argued that E(L i e i ) = 0—individuals’ labor-market<br />

traits are unlikely to be correlated with their being born on a r<strong>and</strong>omly selected<br />

date. Angrist also argued that E(L i M i ) does not equal zero—people who won the<br />

lottery were more likely than others to have military experience. E(L i e i ) = 0 <strong>and</strong><br />

E(L i M i ) Z 0 were Angrist’s two covariance restrictions. Finally, Angrist argued<br />

that one’s lottery status should not itself directly influence wages, so he also excluded<br />

L from the wage equation. The method <strong>of</strong> moments estimator for b based<br />

on E(e i ) = 0 <strong>and</strong> E(L i e i ) = 0 is:<br />

b ~ 1 = al iw i<br />

a l im i<br />

,<br />

where l, w, <strong>and</strong> m are L, W, <strong>and</strong> M measured as deviations from their own<br />

means. Angrist’s second covariance restriction, that E(L i M i ) does not equal zero,


EXT 10-12 Web Extension 10<br />

ensures that the denominator in the estimator is highly unlikely to be zero in<br />

large samples.<br />

Angrist implemented this estimator <strong>and</strong> concluded that military service does<br />

not improve future wage prospects. In the 1980s, years after their military service,<br />

white lottery veterans’ earnings were 15% less than comparable workers who had<br />

not served; black lottery veterans’ earnings were statistically indistinguishable<br />

from those <strong>of</strong> otherwise comparable nonveterans. Had Angrist not been able to<br />

argue for the plausibility <strong>of</strong> his covariance <strong>and</strong> exclusion restrictions, the effect <strong>of</strong><br />

military service on earnings would not be identifiable from the earnings <strong>and</strong> draft<br />

lottery data. Because many economists were persuaded by Angrist’s assumptions,<br />

his estimates <strong>of</strong> the effect <strong>of</strong> military service gained widespread acceptance. Those<br />

economists who did not accept Angrist’s identifying assumptions remained unpersuaded<br />

by his parameter estimates.<br />

An Organizational Structure<br />

for the Study <strong>of</strong> Econometrics<br />

1. What is the DGP?<br />

GMM requires restrictions on moments.<br />

2. What Makes a Good Estimator?<br />

Consistency<br />

3. How Do We Create an Estimator?<br />

<strong>Method</strong> <strong>of</strong> moments <strong>and</strong> generalized method <strong>of</strong> moments<br />

4. What Are an Estimator’s Properties?<br />

Identification is a minimal requirement for consistency.<br />

GMM usually yields consistency.<br />

5. How Do We Test Hypotheses?<br />

Summary<br />

The extension began by introducing a strategy, the method <strong>of</strong> moments, <strong>and</strong> its<br />

generalization, the generalized method <strong>of</strong> moments (GMM), for constructing<br />

asymptotically normally distributed consistent estimators in a vast array <strong>of</strong><br />

DGPs. The assumptions about means, variances, <strong>and</strong> covariances that underpin<br />

GMM prove to be minimal requirements for consistent estimation <strong>of</strong> the parame-


<strong>Generalized</strong> <strong>Method</strong> <strong>of</strong> <strong>Moments</strong> <strong>Estimators</strong> <strong>and</strong> Identification EXT 10-13<br />

ters <strong>of</strong> a DGP. We learn from this that a DGP may contain too little information<br />

to support consistent estimation <strong>of</strong> some or all <strong>of</strong> its parameters. When a DGP<br />

suffers such a shortage <strong>of</strong> information, we say some or all <strong>of</strong> its parameters are<br />

underidentified.<br />

The Law <strong>of</strong> Large Numbers <strong>and</strong> the Central Limit Theorem that imply the<br />

consistency <strong>and</strong> asymptotic normality <strong>of</strong> GMM estimators (including OLS estimators)<br />

rest on bounded variances. When explanators or dependent variables in<br />

our DGPs have unbounded variances, the normality <strong>and</strong> even the consistency <strong>of</strong><br />

estimators becomes questionable.<br />

Concepts for Review<br />

Exactly identified<br />

<strong>Generalized</strong> method <strong>of</strong> moments (GMM)<br />

Just identified<br />

<strong>Method</strong> <strong>of</strong> moments<br />

<strong>Moments</strong><br />

Overidentified<br />

Underidentified<br />

Questions for Discussion<br />

1. “This identification business is nonsense. I can always run a regression, barring multicollinearity.<br />

That gives me estimates <strong>of</strong> the parameters <strong>of</strong> interest. I can always use<br />

those estimates.” Agree or disagree, <strong>and</strong> discuss.<br />

Problems for Analysis<br />

1. For the DGP<br />

Y i = bX i + e i<br />

E(Z i X i ) Z 0<br />

E(Z i e i ) = 0,<br />

show that the method <strong>of</strong> moments estimator <strong>of</strong> the slope is<br />

b Z = a Z i Y i > a Z i X i .<br />

2. For the DGP in problem 1, show that the method <strong>of</strong> moments estimator is consistent<br />

if<br />

plimQ 1 n a Z iX i R Z 0 <strong>and</strong> plimQ 1 n a Z ie i R = 0.<br />

3. Show that a valid instrumental variable estimator for a DGP with a straight line<br />

through the origin is also a method <strong>of</strong> moments estimator.<br />

b Z


EXT 10-14 Web Extension 10<br />

Endnotes<br />

1. Ralph Waldo Emerson, “Self-Reliance,” in Essays, 1841.<br />

2. Notice a special case here. If our DGP provides an increasing number <strong>of</strong> observations<br />

for which X = 0, we can consistently estimate b 0 by restricting attention to only<br />

those observations.<br />

3. S<strong>and</strong>ra E. Black <strong>and</strong> Lisa M. Lynch, “How to Compete: The Impact <strong>of</strong> Workplace<br />

Practices <strong>and</strong> Information Technology on Productivity,” Review <strong>of</strong> Economics <strong>and</strong><br />

Statistics 83, no. 3 (August <strong>20</strong>01): 434–445.<br />

4. Joshua Angrist, “Lifetime Earnings <strong>and</strong> the Vietnam Era Draft Lottery: Evidence<br />

from Social Security Administrative Records,” The American Economic Review 80,<br />

no. 3 (June 1990): 313–336.<br />

Appendix <strong>20</strong>.A<br />

The Consistency <strong>of</strong> <strong>Method</strong> <strong>of</strong> <strong>Moments</strong><br />

<strong>Estimators</strong><br />

<strong>Method</strong> <strong>of</strong> moments estimators are very <strong>of</strong>ten consistent. In particular, they are<br />

consistent whenever the Law <strong>of</strong> Large Numbers ensures that the sample moments<br />

converge in probability to the corresponding population moments. For example,<br />

in estimating a straight line with unknown slope <strong>and</strong> intercept, if plim A 1 ,<br />

<strong>and</strong> plim A 1 , then the method <strong>of</strong> moments estimators b ~ n ge iB<br />

<strong>and</strong> b ~ = 0<br />

n gX ie i B = 0<br />

0 1 are<br />

consistent. Because unbounded variances can undermine the Law <strong>of</strong> Large Numbers,<br />

the consistency <strong>of</strong> b ~ <strong>and</strong> b ~ 0 1 requires that the joint distribution <strong>of</strong> the X i <strong>and</strong><br />

the e be bounded appropriately. 1<br />

i<br />

<strong>20</strong>.A.1<br />

WHAT ARE AN ESTI-<br />

MATOR’S PROPERTIES?<br />

Proving Consistency<br />

This section applies the rules for manipulating probability limits from Table 12.1<br />

to prove the consistency <strong>of</strong> the method <strong>of</strong> moments estimators <strong>of</strong> the slope <strong>and</strong> intercept<br />

<strong>of</strong> a straight line, assuming that plim A 1 n ge iB = 0 <strong>and</strong> plim A 1 na X ie iB = 0.<br />

Assuming instead that E(e i ) = 0 <strong>and</strong> that both the (homoskedastic) disturbances<br />

<strong>and</strong> (fixed) explanators have finite, nonzero variances would suffice to establish,<br />

by the Law <strong>of</strong> Large Numbers, that these two probability limits are zero.


<strong>Generalized</strong> <strong>Method</strong> <strong>of</strong> <strong>Moments</strong> <strong>Estimators</strong> <strong>and</strong> Identification EXT 10-15<br />

Because rule (iii) in Table 12.1 states that the plim <strong>of</strong> a sum is the sum <strong>of</strong> the<br />

plims, the convergence <strong>of</strong> the sample moments can be written<br />

plimSQ 1 n R a e iT = plimSQ 1 n R a (Y i - b 0 - b 1 X i )T<br />

= plimQ 1 n a Y iR - b 0 - b 1 plimQ 1 n a X iR = 0<br />

<strong>20</strong>.A.1<br />

<strong>and</strong><br />

plimSQ 1 n R a X ie i T = plimSQ 1 n R a X i(Y i - b 0 - b 1 X i )T<br />

= plimQ 1 n a X iY i R - b 0 plimQ 1 n a X iR - b 1 plimQ 1 n a X 1 2 R = 0.<br />

<strong>20</strong>.A.2<br />

When the sample moments converge in probability to their population values,<br />

what the method <strong>of</strong> moments insists be true about the residuals in any sample<br />

proves to be true about the disturbances (in probability limit) as the sample size<br />

grows large. This buys consistency for the method <strong>of</strong> moments estimators. More<br />

1<br />

1<br />

formally, because the method <strong>of</strong> moments always sets <strong>and</strong> n gX ie ~ n g e~ i<br />

i equal to<br />

zero, their probability limits are also zero (according to the first rule in Table<br />

12.1, that the plim <strong>of</strong> a constant is the constant):<br />

<strong>and</strong><br />

plimSQ 1 n R a e~ iT = plimSQ 1 n R a (Y i - b ~ 0 - b ~ 1X i )T = plim(0) = 0<br />

plimSQ 1 n R a X ie ~ iT = plimSQ 1 n R a X i(Y i - b ~ 0 - b ~ 1X i )T = plim(0) = 0.<br />

Applying rule (vi) from Table 12.1 (that the plim <strong>of</strong> a continuous function is the<br />

function <strong>of</strong> the plims) then yields<br />

plimSQ 1 n R a e~ iT = plimSQ 1 n R a (Y i - b ~ 0 - b ~ 1X i )T<br />

= plimQ 1 n a Y iR - plim(b ~ 0) - plim(b ~ 1) plimQ 1 n a X iR = 0.<br />

<strong>20</strong>.A.3<br />

<strong>and</strong><br />

plimSQ 1 n R a X ie ~ iT = plimSQ 1 n R a X i(Y i - b ~ 0 - b ~ 1X i )T<br />

= plimQ 1 n a X iY i R - plim(b ~ 0) plimQ 1 n a X iR<br />

- plim(b ~ 1) plimQ 1 n a X i 2 R = 0.<br />

<strong>20</strong>.A.4


EXT 10-16 Web Extension 10<br />

Comparing Equations <strong>20</strong>.A.3 <strong>and</strong> <strong>20</strong>.A.4 with Equations <strong>20</strong>.A.1 <strong>and</strong> <strong>20</strong>.A.2<br />

shows that plim(b ~ 0) <strong>and</strong> plim(b ~ 1) appear in Equations <strong>20</strong>.A.3 <strong>and</strong> <strong>20</strong>.A.4 just<br />

where b 0 <strong>and</strong> b 1 do in Equations <strong>20</strong>.A.1 <strong>and</strong> <strong>20</strong>.A.2. If Equations <strong>20</strong>.A.3 <strong>and</strong><br />

<strong>20</strong>.A.4 have a unique solution for plim <strong>and</strong> plim , then plim <strong>and</strong><br />

plim (b ~ (b ~ 0) (b ~ 1) (b ~ 0) = b 0<br />

1) = b 1 .<br />

There is one case in which the solution <strong>of</strong> Equations <strong>20</strong>.A.3 <strong>and</strong> <strong>20</strong>.A.4 for<br />

plim(b ~ 0) <strong>and</strong> plim(b ~ 1) is not unique. If X takes on only one value, <strong>and</strong> is therefore<br />

perfectly collinear with the intercept term, Equation <strong>20</strong>.A.4 is equivalent to Equation<br />

<strong>20</strong>.A.3: Equation <strong>20</strong>.A.4 reduces to Equation <strong>20</strong>.A.3 if we divide both sides<br />

<strong>of</strong> Equation <strong>20</strong>.A.4 by the constant value <strong>of</strong> X. In this case, we really have only<br />

one equation in two unknowns. And, in this special case <strong>of</strong> perfect multicollinearity,<br />

the two equations do not yield a unique solution for plim <strong>and</strong> plim .<br />

Otherwise, the solution is unique, so plim(b ~ (b ~<br />

<strong>and</strong> plim (b ~ 0) (b ~ 1)<br />

0) = b 0 1) = b 1 .<br />

Thus, barring perfect multicollinearity, when the sample moments converge<br />

in probability to their population expectations, b ~ <strong>and</strong> b ~ 0 1 are consistent estimators.<br />

Barring perfect collinearity–like problems, method <strong>of</strong> moments estimators<br />

are generally consistent when the sample moments converge to their population<br />

values. Because it is the Law <strong>of</strong> Large Numbers that ensures that sample means<br />

converge in probability to their population analogs, the Law <strong>of</strong> Large Numbers is<br />

key to the consistency <strong>of</strong> method <strong>of</strong> moments estimators. As noted earlier, infinite<br />

variances in a DGP endanger the applicability <strong>of</strong> the Law <strong>of</strong> Large Numbers.<br />

Endnotes<br />

1<br />

1. n gX ie i will converge in probability to zero if its expectation is zero <strong>and</strong> its variance<br />

goes to zero as n grows. By assumption, E(X i e i ) = 0, so the first criterion is met. The<br />

second criterion will also be met if we assume var(X (equal to E(X 2 is a finite,<br />

nonzero constant, t 4 , so that varA 1 i e 2 i e i )<br />

i ))<br />

n gX ie iB = t 4 >n.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!