1 An introduction to Mincer Wage Regres- sions - IZA

1 An introduction to Mincer Wage Regressions 

The wage regression, as developed by Mincer, dates back to 1958 and 1974. 

Ben Porath (1967) is the first formal presentation. 

the starting point of the human capital accumulation model is the link 

between wages (observed) and the quantity of skills owned by an individual 

(unobserved) in a competitive labor market. That is 

where 

• Wt is the market wage rate 

Wt = Pt.Ht 

• Pt. is the price of a unit of skills 

• Ht is the total quantity of skills (Human capital) 

• now, clearly 

log Wt =logPt. +logHt 

The major contribution of human capital theory is to develop a production 

function approach to the specification of Ht(.) such that it may be 

expressed as a function of observable variables (time allocation) 

1.1 Mincer (1974): The Mincerian wage Regression 

In his classical book, Mincer develops the famous Mincerian wage regression; 

the statistical relationship between market wages, education and experience. 

at this stage, the Mincer wage equation is solely descriptive and it is not 

informative about the optimal quantity of schooling (it is typically developed 

under the assumption of an exogenously determined rate of human capital 

accumulation). Define 

• Et as potential earnings at t (if working full time). Think of Et as 

human capital. 

• ct = kt · Et as investment in human capital (say training) 

1 

(1) 

(2)

• kt is the amount of time (exogenous) devoted to training (kt =1when 

in school) 

• ρ t is the return to training (or schooling) 

Et+1 = Et + ct · ρ t = Et(1 + kt · ρ t) (3) 

• In general, starting from period 0 (earnings potential at 0 is E0) 

t−1 

Et =[ 

Y 

(1 + kj · ρj)]E0 j=0 

now, divide the horizon in 2 periods (school and post schooling training) 

• ρ t = ρ s (in school) 

• ρ t = ρ 0 (in training activities) 

• clearly 

t−1 

Et =[ 

Es =[(1+ρ s) s · E0 

Y 

(1 + kj · ρ0)](1 + ρs) s · E0 

j=s 

• taking logs of potential earnings, and noting that both ρ0 and ρs are 

small 

Xt−1 

(6) 

ln Et =lnE0 + s · ρ s + ρ 0 · 

• the next steps are based on the willingness to introduce experience (X) 

in the expression. The trick is to specify kj(X) by imposing an exogenous 

rate of capital accumulation beyond school completion. Noting 

that at t=T-s=X 

j=s 

ln(w(s, x)) = α0 + ρ s · S + β 0 · X + β 1 · X 2 

which is the celebrated Mincer wage regression. 

• note that, returning to (.), the intercept term is now the product of the 

log skill price and the initial ability. 

2 

kj 

(4) 

(5) 

(7)

• ρ s acts as the ”return to schooling” 

• β 0/β 1 define the return to experience (and reflect concavity of the age 

earnings profile when β 1 is negative) 

• When (.) is augmented with race/ sex, then it may be used to study 

”discrimination” 

1.2 measuring the returns to schooling 

1.2.1 background 

Before describing the key aspects of the structural literature, it is useful to 

survey some historical aspects. For the sake of the presentation, I consider 

two distinct periods. First, as the literature devoted to the return to schooling 

can hardly be distinguished from the Mincerian wage regression, I review 

briefly the creation and the development of the Mincerian wage regression 

model. A second period covers a large number of empirical studies that 

used Instrumental Variable (IV) techniques to infer the returns to schooling. 

These papers belong to an even larger set of empirical research using the 

“Natural Experiment” approach. 

The standard form of the Mincer wage regression is 

log Wt = wt = β 0 + β 1 · Schoolingt + β 2 · expt + β 3 · exp 2 t +εt 

where exp denotes accumulated labor market experience, β 0 is the sum of the 

rental price of human capital (in log) and the level of ability, and where β 1,β 2 

and β 3 are the parameters that represent the technology by which schooling 

and labor market experience are transformed into skills. 

For a fixed (known) level of ability, the term εt represents a pure random 

shock affecting wages at a particular point in time. When (1) is fit 

to a cross-section of heterogenous individuals, the error term will typically 

be interpreted as unmeasured differences in innate ability. The assumptions 

of linearity in the error term and in accumulated schooling are ad hoc assumptions 

which are only based on statistical choices. As it will be clear 

later, these assumptions are not innocuous. Finally, the quadratic term in 

experience is meant to allow for the possible decline in post-schooling human 

capital acquisition. 

3 

(8)

The term “ability” should be understood as a time invariant level of skills 

that exists prior to the start of the human capital accumulation process and 

that affects labor market wages, even after controlling for acquired human 

capital. I refer to β 1 as the return to schooling. It designates the marginal 

effect of schooling in percentage on log wages (as opposed to the internal rate 

of return). If the effect of schooling is non-linear, the local return to schooling 

refers to the percentage wage increase per additional year of schooling. 1 The 

average return refers to the slope of the straight line between the intercept 

and the expected log wage at a given number of years of schooling. Finally, 

β 2 and β 3 are those that correspond to the return to experience. 

While the human capital literature has been generalized so to incorporate 

post schooling skill acquisition (say on-the-job training), it should be noted 

that the Mincerian wage regression is a representation of the statistical relationship 

between wages and experience (given schooling) for an exogenously 

determined rate of on-the-job training. The Mincerian wage regression disregards 

the endogeneity of post-schooling human capital accumulation and 

treats schooling and training symmetrically. More precisely, the Mincerian 

approach usually ignores the possibility that schooling may change the onthe-job 

human capital accumulation process. 2 Statistically, these refinements 

would either require the joint modeling of schooling and training or the nonseparability 

between schooling and experience. As far as I know, these issues 

have not been addressed extensively in the empirical literature but are likely 

to be the object of research in a near future. 

1.2.2 The instrumental Variable (IV) Literature 

By the early 1970’s, the estimation of the returns to schooling using Mincerian 

wage regressions had become one of the most widely analyzed topic in 

applied econometrics. In a famous survey, Griliches (1977) pointed out several 

econometric problems that arise in estimating the returns to schooling 

and, in particular, those pertaining to the measurement of both schooling 

and ability. Until then, substantial effort had been devoted to the estimation 

of the return to schooling with control variables measuring (or proxies 

of) unobserved ability. These measures are typically IQ tests or objects of a 

1 The terms local and marginal returns may be used interchangeably. 

2 In practice, this could arise if schooling affects traning opportunities or on-the-job 

search outcomes. Some related issues are mentioned in surveys such as those written by 

Willis (1986), Rosen (1984). 

4

similar nature, such as the Armed Forces qualifications (AFQT) tests. Those 

are most likely imperfect measures of labor market skills. Casual empiricism 

would reveal that AFQT scores (available in the popular National Longitudinal 

Survey of Youth) are more strongly correlated with schooling attainments 

than they are with wages 3 

More interestingly, Griliches recognized that the endogeneity of schooling 

decisions, virtually ignored until then, was a serious issue which might 

have prevented economists to uncover the true causal effect of education on 

earnings. Subsequently, a wide range of empirical papers using instrumental 

variable (IV) techniques have been published. A large segment of this literature 

is based on “institutional features” of the education system. Card 

(2001 and 2002) presents an extensive survey of this literature and discuss 

the main conceptual issues within a unifying theoretical structure in which 

individuals compare the benefits of schooling with the costs of schooling born 

early in the life cycle. 4 

At the econometric level, the main issue may be illustrated by the following 

cross-sectional regression function 

wi = β 0 + β 1 · Schoolingi + η i = β 0 + β 1 · Si + η i 

Ignoring post-schooling labor market experience, it is clear that the discrepancy 

between OLS and IV estimates is a reflection of the correlation between 

schooling and unobserved ability (η i). A positive (negative) correlation is 

associated with a positive (negative) ability bias. In the IV literature, the 

ability bias is only indirectly investigated through the discrepancy between 

IV and OLS estimates, assuming that the linear model is correct. As we will 

see later, in structural models, the ability bias may be evaluated directly. 

With estimates of the utility of attending school and market ability, it may 

be obtained by simulation methods. 

E( ˆ β ols) =β + cov(η iSi) 

VarSi 

(9) 

(10) 

if cov(ηiSi) > 0 ability bias (positive) (11) 

if cov(ηiSi) > 0 ability bias (negative) (12) 

3 See Belzil and Hansen (2003). 

4 The model is intertemporal but is non-stochastic. It borrows from becker (1967). 

5

To get around ability bias arguments, it is customary to estimate the 

returns to schooling using IV methods. So let’s define an instrument Zi, such 

that 

The IV estimator is such that 

Corr(Zi,Si) 6= 0 (13) 

Cor(Ziηi) = 0 

E( ˆ cov(log wi,Zi) 

βIV )= 

cov(SiZi) = β + cov(ηiZi) (14) 

cov(SiZi) 

inthecontextwhereSi (and Zi) is discrete, so di =1if individual i goes 

to school and 0 if not, we obtain a Wald estimator 

E( ˆ β IV )= E(log wi | Zi =1− E(log wi | Zi =0) 

pr(di =1| Zi =1)− pr(di =1| Zi =0) 

(15) 

While a positive correlation between schooling and labor market ability 

is usually expected, Card (2001, 2002) reports that a large number of studies 

find that IV estimates exceed OLS estimates by a wide margin. 5 

ˆβ IV > ˆ βols (16) 

As of now, the IV literature retains two main explications for the high 

estimates. 

• First, the existence of significant measurement error in schooling may 

cause the OLS to be severely biased and underestimate the true return. 

My understanding of the literature is that the measurement error argument 

often advanced is typically set within a classical measurement 

error framework which ignores the correlation between schooling levels 

and the measurement error itself and also ignores the discrete nature 

of the schooling variable. 

• Second, in presence of heterogeneity in the returns (say when β 1 

varies across individuals), the IV estimate is inconsistent for the population 

average and the resulting estimate may reflectthereturnsof 

5 Estimates of the order of 15% per year of schooling are not uncommon. 

6

a sub-population only. Indeed, there is a large econometric literature 

concerned with the interpretation of IV estimates when the slopes are 

individual specific. 6 That is 

where 

which implies that 

log wi = β 0 + β i · Si + η i 

β 1i = ¯ β 1 + η β 

i 

log wi = β 0 + ¯ β 1 · Si + η β 

i · Si + η i = β 0 + ¯ β 1 · Si + η ∗ i 

(17) 

(18) 

(19) 

where 

η ∗ i = η β 

i · Si + ηi. (20) 

Clearly, ˆ βIV is inconsistent for ¯ β1. why? Can Zi ⊥ η∗ i ? only if Si ⊥ η β 

i . In 

theory, optimal schooling (Si) is function of η β 

i . 

When the wage regression is given by (.). we refer to a correlated 

random coefficient model (again randomness refers to the dispersion in 

β1i so it is not random from the perspective of the agent. 

The intuition for the failure of IV is simple; in such a framework, individual 

reactions induced by Zi are heterogeneous, so the parameter arising 

from IV estimation is only valid for those who changed status following the 

experiment. So, the IV estimator is only valid for a sub-population (it is not 

¯β 1) 

1.2.3 Other Problems with the IV literature 

Regardless of measurement error and neglected population heterogeneity, the 

IV literature may be criticized for three main reasons. 

• First, Instrumental variables (IV) techniques may be applied in a context 

where the instrument is only weakly correlated with schooling 

attainments (Staiger and Stock, 1997). Indeed, before the late 90’s, 

most empirical researchers concentrated their efforts on finding an instrument 

uncorrelated with neglected ability, but the power of the instrument 

chosen was practically never investigated. In the presence of 

6 For a statistial discussion, see Imbens and Angrist (1994). 

7

weak instruments, reported estimates may be at best imprecise and, at 

worst, seriously biased (or inconsistent). As a consequence, the validity 

of very high returns to schooling, reported in a simple regression framework, 

should be seriously questioned. two cases are worth considered: 

cov(SiZi) ≈ 0 (cov(η iZi =0)→ σ 2 (Z 0 D) −1 Z 0 Z(Z 0 D) −1 →∞ (21) 

cov(SiZi) ≈ 0 (cov(ηiZi ≈ 0) → cov(ηiZi) →∞(bias) (22) 

cov(SiZi) 

• Second, when IV techniques are chosen, the log wage regression is usually 

assumed to be linear in schooling and, perhaps, quadratic in experience. 

However, there is no obvious reason to presume that the local 

returns to schooling are independent of grade level. As individuals with 

lower taste for schooling tend to stop school earlier, OLS (or IV) estimates 

of the return to schooling, which impose equality between local 

and average returns at all levels of schooling, will be strongly affected 

by the relative frequencies of individuals with high and low taste for 

schooling. 7 More precisely, if there are large differences in local returns 

between various grade levels, the OLS estimate (measuring an average 

log wage increment per year of schooling) will tend to be biased toward 

the local returns at schooling attainments that are the most common 

inthesampledata. 

wi = β 0 + ϕ(Schoolingi)+η i 

(23) 

• Finally, the IV literature rarely addresses labor market experience per 

se. This is surprising as labor market experience represents a key substitute 

to schooling as a mean for enhancing life cycle wages. A survey 

of the IV literature reveals that practically no paper presents a joint 

estimation of the returns to schooling and experience. A quick glance 

at the Mincerian wage regression (2) reveals that without controls for 

individual differences in accumulated post-schooling human capital, it 

is difficult to give a interpretation to the discrepancy between OLS and 

7 This issue is sometimes referred to as the Discount Rate bias. 

8

IV estimates. 8 To see this, return to the simple homogenous return 

case β 1i = β 1 for everyone. 

wi = β 0 + β 1 · Si + δ · PSHCi + η i = β 0 + β i · Si + η ∗ i 

(24) 

where PSHCi denotes all post schooling investment activities (training, 

search...) and 

η ∗ i = δ · PSHCi + η i. (25) 

Now reconsider estimating (.), this implies that for Zi to be valid (if wages 

are measured after entrance in the labor market), the following must be true 

Zi ⊥ η ∗ i 

(26) 

but, in general, this cannot be true because Zi ⊥ PSHCi cannot be 

true (think about season of birth, and what happens if you lose one year of 

human capital accumulation). That if if, for instance, one individual loses 

oneyearthenhe/shemayreactbyinvestinginpostschoolingtraining,search 

activities and any other wage enhancing activities (Rosenzweig and Wolpin, 

2000, Journal of Economic Literature). 

8Many issues related to the endogeneity of work experience are discussed in Rosenzweig 

and Wolpin (2000) 

9

1 An introduction to Mincer Wage Regres- sions - IZA

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?