Estimating Distributions of Counterfactuals with an Application ... - UCL

INTERNATIONAL

ECONOMIC

REVIEW

May 2003

Vol. 44, No. 2

2001 LAWRENCE R. KLEIN LECTURE

ESTIMATING DISTRIBUTIONS OF TREATMENT EFFECTS

WITH AN APPLICATION TO THE RETURNS TO SCHOOLING

AND MEASUREMENT OF THE EFFECTS OF UNCERTAINTY

ON COLLEGE CHOICE ∗

BY PEDRO CARNEIRO,KARSTEN T. HANSEN, AND JAMES J. HECKMAN 1

Department **of** Economics, University **of** Chicago; Kellogg School

**of** M**an**agement, Northwestern University; Department **of** Economics,

University **of** Chicago **an**d The Americ**an** Bar Foundation

This article uses factor models to identify **an**d estimate the distributions **of**

counterfactuals. We extend LISREL frameworks to a dynamic treatment effect

setting, extending matching to account for unobserved conditioning variables.

Using these models, we c**an** identify all pairwise **an**d joint treatment effects. We

apply these methods to a model **of** schooling **an**d determine the intrinsic uncertainty

facing agents at the time they make their decisions about enrollment in

school. We go beyond the “Veil **of** Ignor**an**ce” in evaluating educational policies

**an**d determine who benefits **an**d who loses from commonly proposed educational

reforms.

∗ M**an**uscript received October 2000; revised J**an**uary 2003.

1 Previous versions **of** this paper were given at the Midwest Econometrics Group, Chicago, October

2000, Washington University St. Louis, May, 2001, the Nordic Econometrics Meetings, May, 2001 **an**d

workshops at Chicago August, 2002 **an**d St**an**ford, J**an**uary, 2003. A simple version **of** this paper is

presented in Carneiro, H**an**sen, **an**d Heckm**an** (2001). A version **of** this paper was presented by Heckm**an**

as the Klein Lecture at the University **of** Pennsylv**an**ia, September, 28, 2001 **an**d also at the IFAU

conference in Stockholm Sweden, October 2001. We are grateful to all workshop particip**an**ts. We especially

th**an**k Mark Dugg**an**, Orazio Att**an**asio, **an**d Michael Ke**an**e for comments on the first draft **of** this

paper. We have benefited from discussions **with** Ricardo Barros, Richard Blundell, Fr**an**cisco Buera,

Flavio Cunha, Mark Dugg**an**, Lars H**an**sen, Steven Levitt, Bin Li, Luigi Pistaferri, **an**d Sergio Urzua

on subsequent drafts. We single out Salvador Navarro **an**d Edward Vytlacil for especially helpful comments.

We are grateful to Flavio Cunha **an**d Salvador Navarro for exceptional research assist**an**ce **an**d

hard work. This research is supported by NSF 97-09-873, SES-0099195, **an**d NICHD-5RO1-HD34958.

Heckm**an**’s work was also supported by the Americ**an** Bar Foundation **an**d the Donner Foundation,

Pedro Carneiro’s research was supported by Fundação Ciência **an**d Tecnologia **an**d Fundação Calouste

Gulbenki**an**. Please address correspondence to: James J. Heckm**an**, Department **of** Economics, University

**of** Chicago, 1126 E. 59th Street, Chicago, IL 60637, USA. Tel: +773 702-0634. Fax: +773 702-8490.

E-mail: jjh@uchicago.edu.

361

362 CARNEIRO, HANSEN, AND HECKMAN

1. INTRODUCTION

The recent literature on evaluating social programs finds that persons (or firms

or institutions) respond to the same policy differently (Heckm**an**, 2001). The distribution

**of** responses is usually summarized by some me**an**. A variety **of** me**an**s c**an**

be defined depending on the conditioning variables used. Different me**an**s **an**swer

different policy questions. There is no uniquely defined “effect” **of** a policy.

The research reported here moves beyond me**an**s as descriptions **of** policy outcomes

**an**d determines joint counterfactual distributions **of** outcomes for alternative

interventions. From the knowledge **of** the joint distributions **of** counterfactual

outcomes it is possible to determine the proportion **of** people who benefit or lose

from making a particular policy choice (taking or not taking particular treatments),

the origin **an**d destination outcomes **of** those who ch**an**ge states because **of** policy

interventions, **an**d the amount **of** gain (or loss) from various policy choices by

persons at different deciles **of** **an** initial prepolicy distribution. Our work builds

on previous research by Heckm**an** **an**d Smith (1993, 1998) **an**d Heckm**an** et al.

(1997) that uses experimental data to bound or point-identify joint counterfactual

distributions. We extend the **an**alysis **of** Aakvik et al. (1999, 2003), who use

factor models to identify counterfactual distributions to consider indicators for

unobservables, implications from choice theory, **an**d to exploit the benefits **of**

p**an**el data.

From the joint distribution **of** counterfactuals, it is possible to generate all me**an**,

medi**an**, or other qu**an**tile gains, to identify all pairwise treatment effects in a

multi-outcome setting, **an**d to determine how much **of** the variability in returns

across persons comes from variability in the distributions **of** the outcome selected

**an**d how much comes from variability in opportunity distributions. Using the

joint distribution **of** counterfactuals, it is possible to develop a more nu**an**ced

underst**an**ding **of** the distributional impacts **of** public policies, **an**d to move beyond

comparisons **of** aggregate overall distributions induced by different policies to

consider how people in different portions **of** **an** initial distribution are affected

by public policy. We extend the **an**alysis **of** DiNardo et al. (1996) to consider

self-selection as a determin**an**t **of** aggregate wage **an**d earnings distributions.

Using our methods, we re**an**alyze the model **of** Willis **an**d Rosen (1979), who

apply the Roy model (1951) to the economics **of** education. We extend their model

to account for uncertainty in the returns to education. We also distinguish between

present value income-maximizing **an**d utility-maximizing evaluations **of** schooling

choices **an**d we estimate the net nonpecuniary benefit **of** attending college. We

use information on the choices **of** agents to determine how much **of** the ex post

heterogeneity in the return to schooling is forecastable at the time agents make

their schooling choices. This procedure extends the **an**alysis **of** Flavin (1981) to

a discrete choice setting. This allows us to identify the effect **of** uncertainty on

schooling choices. Ex **an**te, there is a great deal **of** uncertainty regarding the returns

to schooling (in utils or dollars). Ex post, 8% **of** college graduates regret going to

college.

The pl**an** **of** this article is as follows. Section 2 presents the essential idea underlying

the identification strategy used in this article **an**d how our approach is

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 363

related to previous work. Section 3 presents a general policy evaluation framework

for counterfactual distributions **with** multiple treatments followed over time.

The strategy pursued in this article is based on using low-dimensional factors to

generate distributions **of** potential outcomes. We show how our methods generalize

the method **of** matching by allowing some or all **of** the variables that generate

the conditional independence assumed in matching to be unobserved by the **an**alyst.

Section 4 introduces the factor models used in this article. Section 5 presents

pro**of**s **of** semiparametric identification. Section 6 applies the **an**alysis to extend the

Rosen–Willis model **of** college choice to account for uncertainty **an**d to estimate

the information about future earnings available to agents at the time schooling

decisions are made. Section 7 reports estimates **of** the distributions **of** returns to

schooling, the components unforecastable by the agent at the time schooling decisions

are made, **an**d the nonpecuniary net benefits from attending college. Section 8

applies our estimates to evaluate a reform **of** the U.S. educational system. It illustrates

the power **of** our method to lift the commonly invoked veil **of** ignor**an**ce

**an**d move beyond aggregate distributions **of** outcomes to underst**an**d the consequences

**of** public policies on persons in various parts **of** the overall distribution.

Section 9 concludes. We first provide a brief introduction to the literature to put

this article in context.

2. ESTIMATING DISTRIBUTIONS OF COUNTERFACTUAL OUTCOMES

In order to place the approach used in this article in the context **of** **an** emerging

literature on heterogeneous treatment effects, it is helpful to motivate our work by

a two-outcome, two-treatment cross section model. For simplicity, in this section it

is assumed that the outcomes are continuous r**an**dom variables. The **an**alysis in the

rest **of** this article is for multiple treatments **an**d multiple outcomes followed over

time, **an**d the outcomes may be discrete, continuous, or mixed discrete-continuous.

The agent c**an** experience one **of** two possible counterfactual states **with** associated

outcomes (Y 0 , Y 1 ). The states are schooling levels in our empirical **an**alysis.

X is a determin**an**t **of** the counterfactual outcomes (Y 0 , Y 1 ); S = 1 if the agent is

in state 1; S = 0 otherwise. The observed outcome is Y = SY 1 + (1 − S)Y 0 . There

may be **an** instrument (or set **of** instruments) Z such that (Y 0 , Y 1 ) ⊥⊥ Z | X **an**d

Pr(S = 1 | Z, X) depends on Z for all X (so it is a nontrivial function **of** Z), i.e., Z

is in the choice probability but not the outcome equation. (A ⊥⊥ B| C me**an**s A is

independent **of** B given C). We show below that such a Z is not strictly required in

our approach. The st**an**dard treatment effect model assumes policies (Z) that affect

choices **of** treatment but not potential outcomes (Y 0 , Y 1 ). General equilibrium

effects are ignored. 2

The goal **of** our **an**alysis is to recover F(Y 0 , Y 1 | X). As noted in Heckm**an** (1992),

Heckm**an** **an**d Smith (1993, 1998), **an**d Heckm**an** et al. (1997), from this joint distribution

it is possible to estimate the proportion **of** people who benefit (in terms **of**

gross gains) from participation in the program (Pr(Y 1 > Y 0 | X)), gains to particip**an**ts

at selected levels **of** the no-treatment distribution (F(Y 1 − Y 0 | Y 0 = y 0 , X)),

2 See Heckm**an** et al. (1998a, 1998b, 1998c, 2000) for a treatment **of** general equilibrium policy

evaluation.

364 CARNEIRO, HANSEN, AND HECKMAN

or treatment distribution (F(Y 1 − Y 0 | Y 1 = y 1 , X)), the option value **of** social programs,

**an**d a variety **of** other questions that c**an** be **an**swered using distributions **of**

potential outcomes including conventional me**an** treatment effects **an**d qu**an**tiles

**of** the gains (Y 1 − Y 0 ) for those who receive treatment.

The problem **of** recovering joint distributions arises because we observe Y 0 if

S = 0 **an**d Y 1 if S = 1. Thus we know F(Y 0 | S = 0, X), F(Y 1 | S = 1, X) but not

F(Y 0 | X) orF(Y 1 | X). In addition, we do not observe the pair (Y 0 , Y 1 ) for **an**yone.

Thus, we c**an**not directly obtain F(Y 1 , Y 0 | S, X) from the data. Additional

information is required to identify the joint distribution.

There are, then, two separate problems. The first is a selection problem. From

F(Y 1 | S = 1, X) **an**d F(Y 0 | S = 0, X), under what conditions c**an** one recover

F(Y 1 | X) **an**d F(Y 0 | X), respectively? The second problem is how to construct

the joint distribution F(Y 0 , Y 1 | X) from the two marginals.

Assuming that the selection problem c**an** be surmounted, the classical probability

results due to Fréchet (1951) **an**d Hoeffding (1940) show how to bound

F(Y 1 , Y 0 | S, X) using the marginal distributions. In practice these bounds are very

wide, **an**d the inferences based on the bounding distributions are **of**ten not useful. 3

The traditional (pre-1985) approach to program evaluation in economics assumed

that F(Y 0 , Y 1 | X) is degenerate because conditional on X, Y 1 **an**d Y 0 are

deterministically related:

(1)

Y 1 ≡ Y 0 + (X)

This is the “common effect” assumption that postulates that conditional on X,

the treatment has the same effect on everyone. From the me**an**s **of** F(Y 0 | S =

0, X) **an**d F(Y 1 | S = 1, X) corrected for selection, one c**an** identify E((X)) =

E(Y 1 | X) − E(Y 0 | X). (See Heckm**an** **an**d Robb, 1985, 1986 (reprinted 2000) for

a variety **of** estimators for this case **an**d for discussion **of** more general cases.)

Heckm**an** **an**d Smith (1993, 1998) **an**d Heckm**an** et al. (1997) relax this assumption

by assuming perfect r**an**king across different counterfactual outcome

distributions. Assuming absolutely continuous **an**d strictly increasing marginal

distributions, they postulate that qu**an**tiles are perfectly r**an**ked so Y 1 =

F −1

1,X (F 0,X(Y 0 )) where F 1,X = F 1 (y 1 | X) **an**d F 0,X = F 0 (y 0 | X). This assumption

generates a deterministic relationship that turns out to be the tight upper bound

**of** the Fréchet bounds. An alternative assumption is that people are perfectly

inversely r**an**ked so the best in one distribution is the worst in the other: Y 1 =

F −1

1,X (1 − F 0,X(Y 0 )). This is the tight Fréchet lower bound. More generally, one c**an**

associate qu**an**tiles across distributions more freely. Heckm**an** et al. (1997) use

Markov tr**an**sition kernels that stochastically map qu**an**tiles **of** one distribution

into qu**an**tiles **of** **an**other. They define a pair **of** Markov kernels M(y 1 , y 0 | X) **an**d

˜M(y 0 , y 1 | X) such that

∫

F 1 (y 1 | X) =

∫

F 0 (y 0 | X) =

M(y 1 , y 0 | X)dF 0 (y 0 | X)

˜M(y 0 , y 1 | X)dF 1 (y 1 | X)

3 See Heckm**an** **an**d Smith (1993, 1998) **an**d Heckm**an** et al. (1997).

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 365

Allowing these operators to be degenerate produces a variety **of** deterministic

tr**an**sformations, including the two previously presented, as special cases **of** a general

mapping. Different (M, ˜M) pairs produce different joint distributions. 4 These

stochastic or deterministic tr**an**sformations supply the missing information needed

to construct the joint distributions.

A perfect r**an**king (or perfect inverse r**an**king) assumption is convenient. It

generalizes the perfect-r**an**king, const**an**t-shift assumptions implicit in the conventional

literature. It allows us to apply conditional qu**an**tile methods to estimate

the distributions **of** gains. 5 However, it imposes a strong **an**d arbitrary dependence

across distributions. Our empirical **an**alysis shows that this assumption is at odds

**with** data on the returns to education.

An alternative approach to constructing joint distributions due to Heckm**an**

**an**d Honoré (1990), Heckm**an** (1990), **an**d Heckm**an** **an**d Smith (1998) uses the

economics **of** the model by assuming that

(2)

S = 1(µ s (Z) ≥ e s )

where µ s (Z) is a me**an** net utility, Z ⊥⊥ e s , **an**d “1” is a logical indicator (=1ifthe

argument is valid; =0 otherwise). In addition they assume that

Y 1 = µ 1 (X) + U 1 , E(U 1 ) = 0

Y 0 = µ 0 (X) + U 0 , E(U 0 ) = 0

where (U 1 , U 0 ) ⊥⊥ (X, Z). 6 In the special case where S = 1(Y 1 ≥ Y 0 ) (the Roy

model), Heckm**an** **an**d Honoré (1990) present conditions on µ 1 ,µ 0 , **an**d X such

that F(U 1, U 0 ) **an**d µ 1 (X),µ 0 (X) **an**d hence F(Y 0 , Y 1 | X) are identified from data

on choices (S), characteristics (X), **an**d observed outcomes Y = SY 1 + (1 − S)Y 0 .

Buera (2002) extends their approach to nonseparable models **with** weaker exclusion

restrictions.

Heckm**an** (1990) **an**d Heckm**an** **an**d Smith (1998) consider more general decision

rules **of** the form (2) under the assumption that (Z, X) ⊥⊥ (U 0 , U 1 , e s ) **an**d the

further conditions (i) µ s (Z) is a nontrivial function **of** Z conditional on X **an**d (ii)

full support assumptions on µ 1 (X),µ 0 (X), **an**d µ s (Z). They establish nonparametric

identification **of** F(U 0 , e s ), F(U 1 , e s ) up to a scale for e s **an**d µ 1 (X),µ 0 (X),

**an**d µ s (Z) suitably scaled. 7 Hence, under their assumptions, they c**an** identify

F(Y 0 , S | X, Z) **an**d F(Y 1 , S | X, Z) but not the joint distributions F(Y 0 , Y 1 | X) or

F(Y 0 , Y 1 , S | X, Z) unless the U 0 , U 1 , e s dependence is restricted.

Aakvik et al. (1999, 2003) build on Heckm**an** (1990) **an**d Heckm**an** **an**d Smith

(1998) by postulating a factor structure connecting (U 0 , U 1 , e s ). Our work builds

4 Conditions under which (M, ˜M) determine the joint distribution are presented in Roz**an**ov (1982).

5 See, e.g., Heckm**an** et al. (1997), or Athey **an**d Imbens (2002).

6 Me**an** or medi**an** zero assumptions on (U 0 , U 1 ) are also used.

7 See their articles for exact conditions. Heckm**an** **an**d Smith (1998) present the most general set **of**

conditions.

366 CARNEIRO, HANSEN, AND HECKMAN

on their **an**alysis so we describe its essential idea. Suppose that the unobservables

follow a factor structure,

U 0 = α 0 θ + ε 0 , U 1 = α 1 θ + ε 1 , e s = α s θ + ε s

where θ ⊥⊥ (ε 0 ,ε 1 ,ε s ) **an**d the ε’s are mutually independent. In their setup, θ

is a scalar. θ c**an** be **an** unobservable trait like ability or motivation that affects

all outcomes. Because the factor loadings, α 0 ,α 1 ,α s , may be different, the

factors may affect outcomes **an**d choices differently. Recall that one c**an** identify

F(U 0 , e s ) **an**d F(U 1 , e s ) under the conditions specified in Heckm**an** **an**d

Smith (1998) **an**d generalized in Theorems 1–3 below. Thus, one c**an** identify

cov(U 0 , e s ) = α 0 α s σθ 2 **an**d cov(U 1, e s ) = α 1 α s σθ 2 assuming finite vari**an**ces **an**d assuming

E(θ) = 0, E(θ 2 ) = σθ 2. With some normalizations (e.g., σ θ 2 = 1,α s = 1),

under conditions specified in Section 5, we c**an** nonparametrically identify the distribution

**of** θ **an**d the distributions **of** ε 0 ,ε 1 ,ε s (the last up to scale). With α 1 ,α 0 ,α s

**an**d the distributions **of** θ,ε 0 ,ε 1 ,ε s in h**an**d, we c**an** construct the joint distribution

F(Y 0 , Y 1 | X). 8

This article builds on this basic idea **an**d extends it to a more general setting.

We consider a model **with** multiple factors, multiple treatments, **an**d multiple time

periods. Outcome measures may be discrete or continuous. We follow the psychometric

literature by adjoining measurement **an**d choice equations to outcome

equations to pin down the distribution **of** θ. With this framework we c**an** estimate

all pairwise treatment effects in a multiple outcome setting. We also consider the

benefits for identification **of** having access to imperfect measurements on vector

θ, which are observed for all persons independent **of** their treatment status. This

model integrates the LISREL framework **of** Jöreskog (1977) into a model **of** discrete

choice **an**d a model **of** multiple treatment effects. We develop this model

in Section 4 after presenting a more general framework for counterfactuals **an**d

treatment effects in a multi-outcome, possibly dynamic setting.

3. POLICY COUNTERFACTUALS FOR THE MULTIPLE OUTCOME CASE

This section defines policy counterfactuals for the multiple treatment case. For

specificity, think **of** states as schooling levels **an**d different ages as periods in the

life cycle. Associated **with** each state s (schooling level) is a vector **of** outcomes at

age a for person ω ∈ (a set **of** indices) **with** elements

(3)

Y s,a (ω)

s = 1,..., ¯S, a = 1,...,Ā,

where there are ¯S states **an**d Ā ages. Associated **with** each person ω is a vector

X(ω) **of** expl**an**atory variables.

The ceteris paribus effect (or individual treatment effect) **of** a move from state

s ′ at age a ′′ to state s at age a is

8 Aakvik et al. (1999) present other sets **of** identifying assumptions.

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 367

(4)

((s, a), (s ′ , a ′′ ),ω) = Y s,a (ω) − Y s ′ ,a ′′(ω)

Since it is usually not possible to observe the same person in both s **an**d s ′ , **an**alysts

**of**ten focus on estimating various population level versions **of** these parameters

for different conditioning sets. 9 In this article, we estimate the distributions **of**

potential outcomes **an**d parameters derived from these distributions, including

the Average Treatment Effect,

ATE((s, a), (s ′ , a ′′ ), x) = E(Y s,a − Y s ′ ,a′′ | X = x)

**an**d the Marginal Treatment Effect, the average gain from moving from s ′ to s

for those on the margin **of** indifference between s **an**d s ′ . We are interested in

determining the joint distributions **of** the counterfactuals ((s, a), (s ′ , a ′′ ), x) for

different conditioning sets.

Associated **with** each treatment or state (schooling choice) is a choice equation

associated **with** a level **of** lifetime utility: V s (ω), s = 1,..., ¯S. Utilities are assumed

to be absolutely continuous. Agents select treatment states (schooling levels) ˜s to

maximize utility:

(5)

˜s = arg max{V s (ω)} ¯S

s=1

s

Associated **with** choices are expl**an**atory variables Z(ω). A distinctive feature **of**

the econometric approach to program evaluation is that it evaluates policies both

in terms **of** objective outcomes (the Y s,a (ω)) **an**d in terms **of** subjective outcomes

(the utilities **of** the agents making the choices). Both subjective **an**d objective

evaluations are useful in evaluating policy. Choice theory is also used to guide **an**d

rationalize specific choices **of** estimators. It enables us to separate variability from

intrinsic uncertainty, as we demonstrate below.

This framework is sufficiently general to encompass a variety **of** choice processes

including sequential dynamic programming models 10 **an**d ordered choice

models, 11 as well as more general unordered choice models. We let D s = 1if

treatment s is selected. Since there are ¯S mutually exclusive states, ∑ ¯S

s = 1 D s = 1.

In this notation, the marginal treatment effect for choices s **an**d s ′ is

(6)

MTE(a, V s,s ′) = E(Y s,a − Y s ′ ,a | V s = V s ′ = V s,s ′ ≥ V j , j ≠ s, s ′ )

It is the average gain **of** going from s ′ to s at age a for persons indifferent between

s **an**d s ′ given that s **an**d s ′ are the best two choices in the choice set, **an**d that their

level **of** utility is V s,s ′.

9 Heckm**an** **an**d Smith (1998) **an**d Heckm**an** et al. (1999) discuss conditions under which it is possible

to estimate (4).

10 See Eckstein **an**d Wolpin (1999) **an**d Ke**an**e **an**d Wolpin (1997).

11 See Cameron **an**d Heckm**an** (1998) **an**d H**an**sen et al. (2001).

368 CARNEIRO, HANSEN, AND HECKMAN

Aggregating over choices s ′ = 1,..., ¯S; s ′ ≠ s, we may define the marginal treatment

effect over all origin states as

(7)

MTE s

(a, {V s,s ′} ¯S

s ′ =1,s ′ ≠s

)

=

¯S∑

s ′ =1

s ′ ≠s

(

/ (

))

MTE(a, V s,s ′) f (V s , V s ′ | V s = V s ′ = V s,s ′ ≥ V j , j ≠ s, s ′ ) ψ a, {V s,s ′}

s S ′ =1,s ′ ≠s

the weighted average **of** the pairwise marginal treatment effects from all source

states to s (at a given level **of** utility V s,s ′) **with** the weights being the density **of**

persons at each relev**an**t margin for specified values **of** utility where

ψ ( a, {V s,s ′} ¯S )

¯S∑

s ′ = 1,s ′ ≠s = f (V s , V s ′ | V s = V s ′ = V s,s ′ ≥ V j , j ≠ s, s ′ )

s ′ = 1

s ′ ≠s

is a normalizing const**an**t (the population density **of** people at all margins given a

**an**d {V s,s ′}

s S ′ =1,s ′ ≠s

), assumed to be positive.

We next present a framework for estimating the distributions **of** the treatment

effects **an**d the parameters derived from them, which allows us to estimate the

parameters defined in this section as well as other parameters. To simplify the

notation, we suppress the ω argument in the rest **of** the article.

4. FACTOR STRUCTURE MODELS

The strategy adopted in this article identifies the distribution **of** counterfactuals

by postulating a low-dimensional set **of** factors θ so that, conditional on them

**an**d the covariates X **an**d Z, the Y s,a **an**d V s ′ are jointly independent for all s, s ′ ,

**an**d a. The distributions **of** the components **of** θ are nonparametrically identified

under the conditions specified below. With these distributions in h**an**d, it is possible

to construct the distribution **of** counterfactuals. Under the conditions specified in

Section 5, it is possible **with** low-dimensional factors to nonparametrically identify

the counterfactual distributions **an**d to estimate all **of** the treatment effects in the

literature suitably extended to multidimensional versions.

Throughout this article we **an**alyze a separable-in-the-errors system. Thus, preferences

c**an** be described by

(8)

V s = µ s (Z) − e s

s = 1,..., ¯S.

It is conventional to assume that µ s (Z) = Z ′ β s **with** s = 1,..., ¯S. Linear approximations

to value functions are advocated by Heckm**an** (1981) **an**d Eckstein **an**d

Wolpin (1989) **an**d are developed systematically in Geweke et al. (2001). Our approach

does not require linearity but critically relies on separability between the

deterministic portion **of** the model **an**d the errors e s . Following Heckm**an** (1981),

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 369

Cameron **an**d Heckm**an** (1987, 1998), **an**d McFadden (1984), write

(9)

e s = α ′ s θ + ε s

where θ is a K × 1 vector **of** mutually independent factors (θ l ⊥⊥ θ l ′,l≠ l ′ ) **an**d

define ε s = (ε 1 ,...,ε S )

(10)

θ ⊥⊥ ε s ε s ⊥⊥ ε s ′ ∀ s, s ′ = 1,..., ¯S **an**d s ≠ s ′

E(θ) = 0; E(ε s ) = 0; D k = 1ifV k is maximal in {V s (Z)}

s=1 S .12

Potential outcomes at age a, Ys,a ∗ , are stochastically dependent on each other

**an**d the choices only through their dependence on the observables X, Z **an**d the

factors θ:

(11)

Y ∗

s,a = µ s,a(X) + α ′ s,a θ + ε s,a

where E(ε s,a ) = 0.

Potential outcomes are separable in observables **an**d unobservables. A

linear-in-parameters version is written as µ s,a (X) = X ′ β s,a .Define ε Y = (ε 1,1 ,

...,ε 1,A ,...,ε s,1 ,...,ε s,A ,...,ε S,A )as

(12)

θ ⊥⊥ ε Y

(13)

ε s,a ⊥⊥ ε s ′ ,a ′′; ∀ s ≠ s′ ; ∀ a, a ′′

**an**d

(14)

(15)

ε s,a ⊥⊥ ε s ′; ∀ s ′ , s = 1,..., ¯S; a = 1,...,Ā

(Z, X) ⊥⊥ (θ,ε Y ,ε s )

The Ys,a ∗ may be vector valued.

When the outcome is continuous, the observed value corresponds to the latent

variable (Y s,a = Ys,a ∗ ). When the outcome is discrete (e.g., employment status), we

interpret Ys,a ∗ in (11) as a latent variable. In that case, Y s,a is **an** indicator function

Y s,a = 1(Ys,a ∗ ≥ 0). Tobit **an**d other censored cases c**an** be accommodated. Other

mixed discrete-continuous cases c**an** be h**an**dled in a conventional fashion. 13

One motivation for the factor representation is that agents may observe components

**of** θ (or variables that sp**an** those components) **an**d act on them (e.g.,

choose schooling levels), whereas the econometrici**an** does not observe θ. Below,

we present methods for testing whether agents observe some or all components **of**

12 In the case **of** ties, use the choice **with** the lowest index.

13 See H**an**sen et al. (2003) for duration models **with** general forms **of** dependence functions generated

by this type **of** model.

370 CARNEIRO, HANSEN, AND HECKMAN

θ. Conditional on θ **an**d X, the potential outcomes are independent. If (12)–(15)

accurately describe the data-generating process, we obtain the conditional independence

assumptions used in matching (see, e.g., Cochr**an**e **an**d Rubin, 1973;

Rosenbaum **an**d Rubin, 1983).

In matching it is assumed that Y s,a ⊥⊥ D s | X, Z,θ for all s. 14 From this assumption,

we c**an** identify ATE from the right-h**an**d side **of** the following expression for

continuous observed outcomes, which c**an** be constructed if θ is observable,

E(Y s,a − Y s ′ ,a | X, Z,θ) = E(Y s,a | X, Z,θ,D s = 1)

− E(Y s ′ ,a | X, Z,θ,D s ′ = 1)

In this case treatment on the treated, ATE, **an**d MTE are the same parameter

conditional on θ, X, **an**d Z (Heckm**an**, 2001; Aakvik et al., 2003). Our framework

differs from matching by allowing the factors that generate the conditional independence

that underlies matching to be unobserved by the **an**alyst. In this sense,

our approach is more robust th**an** matching. The price for this robustness is the

assumed independence between θ **an**d (X, Z).

Factor structure models are notorious for being identified by arbitrary normalization

**an**d exclusion restrictions. To reduce this arbitrariness **an**d render greater

interpretability to estimates obtained from our model, we adjoin a measurement

system to choice equations (8) **an**d outcome system (11). Various measurements

c**an** be interpreted as indicators **of** specific factors (e.g., test scores may proxy

ability). Having measurements on the factors also facilitates identifiability under

weaker assumptions as we demonstrate in Section 5. However, measurements are

not strictly required for identification in our model. Outcome, measurement, **an**d

choice equations are interch**an**geable sources **of** identification in a sense that we

make precise in Section 5.

Consider a system **of** L measurements on the K factors, initially assumed to be

for continuous outcome measures:

(16)

M 1 = µ 1 (X) + β 11 θ 1 +···+β 1K θ K + ε M 1

.

M L = µ L (X) + β L1 θ 1 +···+β LK θ K + εL

M

ε M = (ε M 1 ,...,εM L ), E(εM ) = 0 **an**d where we assume θ ⊥⊥ ε M ,θ ⊥⊥ ε s ,ε s ⊥⊥

(ε M ,ε Y ),ε M i

⊥⊥ ε M j

∀ i ≠ j, **an**d i, j = 1,...,L. For interpretability, we assume

θ i ⊥⊥ θ j , ∀ i ≠ j, i, j = 1,...,K. We develop the case **with** discrete measurements

on latent continuous variables in Section 5. One c**an** think **of** the outcome measures

as **an** s-dependent measurement system. The measures (16) are the same

across all s.

Measurement system (16) allows for fallible measures **of** outcomes. Thus, in

our schooling choice **an**alysis we are not committed to the infallibility **of** test

14 Strictly speaking, matching models do not distinguish between X **an**d Z. See Heckm**an** **an**d

Navarro (2003).

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 371

scores as measurements **of** ability. Measurement system (16) allows us to proxy

unobservables accounting for measurement error **an**d hence enables us to improve

on the proxy procedure **of** Olley **an**d Pakes (1996), which assumes no measurement

error.

4.1. Choice Equations. Our **an**alysis applies to both ordered discrete choice

models **an**d unordered choice models as **an**alyzed by Cameron **an**d Heckm**an**

(1998) **an**d H**an**sen et al. (2001). In this article, we focus attention on a new ordered

choice model. Other choice models c**an** easily be accommodated in our framework

**an**d richer models are a source **of** additional identifying information. 15

For **an** ordered discrete choice model, let utility index I be written as

(17)

I = ϕ(Z) + ε W , ε W = γ ′ θ + ε I , σ 2 W = γ ′ ∑ θ γ + σ 2 I

where E(ε 2 I ) = σ I 2, **an**d ∑ θ

is the covari**an**ce matrix **of** θ. A linear-in-parameters

version that is the one developed in this article is written as ϕ(Z) = Zη. Choices

are generated by index ϕ(Z) falling in various intervals.

(18)

D 1 = 1 if−∞< I ≤ c 1

D s = 1 ⇔ c s−1 < I ≤ c s s = 2,..., ¯S − 1

D S = 1 ifc S−1 < I < ∞

where c 0 =−∞. It is required that c s ≥ c s−1 for all s ≥ 2. This is a special case

**of** r**an**dom utility model (8) in which states are ordered **an**d pairwise contrasts

possess a special structure. 16

We c**an** parameterize the c s to be functions **of** state-specific regressors, e.g., c s =

Q s ρ s where we restrict c s ≥ c s−1 . We could also follow a suggestion in Heckm**an**

et al. (1999) to incorporate one-sided shocks ν s **an**d work **with** stochastic thresholds

˜c s in place **of** c s : ˜c s = c s + ν s , s = 1,...,S − 1, where ν s ≥ ν s−1 **an**d ν s ≥ 0. 17

15 See Heckm**an** **an**d Navarro (2001) for a comparison among alternative models **of** completed

schooling. H**an**sen et al. (2003) develop a parallel **an**alysis for a one-factor multinomial choice model.

16 Specifically, it is assumed for (8) that µ s (Z ) is concave in s for each Z (Cameron **an**d Heckm**an**,

1998), that

e s − e s−1 = τ s = 2,...,S

**with** e 1 as **an** initial condition, that

(∗) µ s (Z ) − µ s−1 (Z ) = ϕ(Z ) + c s−1

**with** µ 1 (Z ) as **an** initial condition **an**d that c s ≥ c s−1 for all s = 1,..., ¯S. Ch**an**ges in utilities

across states are independent **of** s, except for **an** intercept. Then in (17) ε W = τ + e 1 . If we set all **of**

the i.i.d. components **of** (9) to zero (the uniquenesses ε s ) we get the ordered probit model. As noted

in the text, **an**d developed in Heckm**an** **an**d Navarro (2001), we c**an** generalize this model to allow

e s − e s−1 = τ + χ s where χ s ≥ 0 is a one-sided r**an**dom variable **an**d still secure identification. The

requirement (∗) precludes a strict r**an**dom utility model because preferences are state specific. (The

strict r**an**dom utility model requires that µ s (Z ) not depend on s but Z c**an** vary across s. See, e.g.,

Matzkin, 1993).

17 Write ν s = ∑ s

j=2 ρ j , where ρ j ⊥ ρ j ′( j ≠ j ′ ),ρ j ⊥ ε W ,ρ j ⊥ (Z, Q),ρ j ≥ 0,ϕ(Z ) = Z ′ η. This

model is identified under the assumptions in Cameron **an**d Heckm**an** (1998) even **with**out **an**y

372 CARNEIRO, HANSEN, AND HECKMAN

Conditioning on Q s = q s , s = 1,..., ¯S, **an**d assuming that the Support (Z| Q s =

q s , s = 1,..., ¯S) = Support(ε W ), we c**an** apply the conditions presented in

Cameron **an**d Heckm**an** (1998) to identify the distribution **of** F εW ,η,c 1 ,...,c S−1

up to scale. We c**an** nonparametrically identify c s (Q s ) over the support **of** Q s under

conditions specified in Theorem 2 below. Unlike the case **of** the more general

unordered discrete choice model (see Elrod **an**d Ke**an**e, 1995; Ben Akiva et al.,

2001), **with**out further restrictions on the distribution **of** ε W , we c**an**not identify

the factors generating ε W using only choice data. H**an**sen et al. (2003) present **an**

**an**alysis parallel to the one given here for a multinomial probit model. In that

model, the distributions **of** factors c**an** be identified from choice data.

4.2. Models for Factors. Factor models are notorious for being identified

through arbitrary assumptions about how factors enter in different equations.

This led to their disuse after their introduction into economics by Jöreskog **an**d

Goldberger (1975), Goldberger (1972), Chamberlain **an**d Griliches (1975), **an**d

Chamberlain (1977a, 1977b).

The essential identification problem in factor **an**alysis is clearly stated by

Anderson **an**d Rubin (1956). If there are L measurements on K mutually independent

factors arrayed in a vector θ, we may write outcomes G in terms **of**

latent variables θ as

(19)

G = µ + θ + ε

where G is L × 1,θ ⊥⊥ ε, µ is **an** L × 1 vector **of** me**an**s, which may depend on

X,θ is K × 1,ε is L × 1, **an**d is L × K. ε i ⊥⊥ ε j , i, j = 1,...,L, i ≠ j. At this

exclusion restrictions, so Q s c**an** just include **an** intercept. The pro**of** is trivial. Normalize ρ 1 = 0. From

the first choice we compute,

Pr(D 1 = 1 | Z) = Pr(Z ′ η + ε W ≤ c 1 )

so we c**an** identify f (ε W ) **an**d η up to scale σ W , assuming ε W **an**d the ρ j have densities **with**

respect to the Lebesgue measure **an**d nonv**an**ishing characteristic function in addition to other

st**an**dard regularity conditions. We suppress the intercept in Z. One c**an**not distinguish the intercept

from c 1 . Proceeding to further choices we obtain

Pr(D 1 + D 2 = 1|Z) = Pr(Z ′ η + ε W ≤ c 2 + ν 2 )

= Pr(ε W − ν 2 ≤ c 2 − Z ′ η)

Therefore, we c**an** identify f (ε W − ν 2 ) **an**d c 2 up to scale σ εW −ν 2

. The scale is determined by the first

normalization (relative to σ εW ). We c**an** estimate σε W − ν 2

σ εW

= ( σ ε 2 + σ 2 W ν 2

) 1/2 by taking the ratio **of** the

σε 2 W

normalized η from the second choice probability to the normalized ratio **of** η from the first choice probability

for **an**y coordinate **of** η.Define ψ(X) as the characteristic function **of** X. From the assumed independence

**of** ε W **an**d ν 2 ,ψ(ε W − ν 2 ) = ψ(ε W )ψ(−ν 2 ). Therefore, we c**an** identify ψ(ε W − ν 2 )

= ψ(−ν ψ(ε W ) 2 ),

**an**d we c**an** determine f (−ν 2 ) from the convolution theorem adopting a normalization for σ W .

Proceeding sequentially, we obtain Pr(D 1 + D 2 + D 3 +···+D k = 1 |Z) = Pr(Z ′ η + ε W ≤ c k + ν k ),

**an**d c**an** identify c k **an**d f (ν k ) up to the normalization given in the first step. From f (ν k ), we c**an** use

deconvolution to identify f (ρ j ), j = 2,...,S. See Heckm**an** **an**d Navarro (2001) for further details

**an**d extensions to factor models. Nowhere in this **an**alysis do we use the assumption that Q s contains

regressors.

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 373

point, ε is a general notation that will be linked to specific ε’s in Section 5. Even

if θ i ⊥⊥ θ j , i ≠ j, i, j = 1,...,K, the model is underidentified. As we will see,

the G in this article is a more general system th**an** the system based solely on

measurements invari**an**t across states M so we distinguish (16) **an**d (19). It will

include M as well as state-dependent outcomes (Ys,a ∗ ) **an**d the indices generating

choice equations.

Using only the information in the covari**an**ce matrices, as is common in factor

**an**alysis,

(20)

cov(G) = θ ′ + D ε

where θ is a diagonal matrix **of** the vari**an**ces **of** the factors, **an**d D ε is a diagonal

matrix **of** the “uniqueness” vari**an**ces. We observe G but not θ or ε, **an**d we seek to

identify , θ , **an**d D ε . Without some restrictions, this is clearly **an** impossible task.

Conventional factor-**an**alytic models make assumptions to identify parameters.

The restriction that the components **of** θ are independent is one restriction that we

have already made, but it is not enough. The diagonals **of** cov(G) combine elements

**of** D ε **with** parameters from the rest **of** the model. Once those other parameters are

L(L− 1)

2

determined, the diagonals identify D ε . Accordingly, we c**an** only rely on the

nondiagonal elements to identify the K vari**an**ces (assuming θ i ⊥⊥ θ j , ∀i ≠ j) **an**d

the L × K factor loadings. Since the scale **of** each θ i is arbitrary, one factor loading

devoted to each factor is normalized to unity to set the scale. Accordingly, we

require that

so

L(L − 1)

≥ (L × K − K) +

} {{

2

}

} {{ } }{{}

K

Number **of** unrestricted Vari**an**ces **of** θ

Number **of** **of**f-diagonal covari**an**ce elements

L ≥ 2K + 1

is a necessary condition for identification.

The strategy pursued in this article is tr**an**sparent **an**d assumes that there are two

or more elements **of** G devoted exclusively to factor θ 1 , **an**d at least three elements

**of** G that are generated by factor θ 1 , two or more other elements **of** G devoted

only to factors θ 1 **an**d θ 2 , **with** at least three elements **of** G that depend on θ 1

**an**d θ 2 , **an**d so forth. This strategy is motivated by our access to psychometric **an**d

longitudinal data. Test scores may only proxy ability (θ 1 ). Other measurements

may proxy only (θ 1 ,θ 2 ). Measurements on earnings from p**an**el data may proxy

(θ 1 ,θ 2 ,θ 3 ), etc.

Order G under this assumption so that we get the following pattern for (we

assume that the displayed λ ij are not zero):

374 CARNEIRO, HANSEN, AND HECKMAN

(21)

⎛

⎞

1 0 0 0

. ... ... 0

λ 21 0 0 0

. ... ... 0

λ 31 1 0 0

. ... ... 0

λ 41 λ 42 0 0

. ... ... 0

λ

= 51 λ 52 1 0

. ... ... 0

.

λ 61 λ 62 λ 63 0

. ... ... 0

λ 71 λ 72 λ 73 1

. 0 ... 0

. λ 81 λ 82 λ 83 λ .. 84 0 ... 0

⎜

⎝

... ... ... ...

. ... ... ... ⎟

⎠

λ L,1 λ L,2 λ L,3 ...

. ... ... λ L,K

Assuming nonzero covari**an**ces

In particular

cov(g j , g l ) = λ j1 λ l1 σ 2 θ 1

, l = 1, 2; j = 1,...,L; j ≠ l

Assuming λ l1 ≠ 0, we obtain

cov(g 1 , g l ) = λ l1 σ 2 θ 1

cov(g 2 , g l ) = λ l1 λ 21 σ 2 θ 1

cov(g 2 , g l )

cov(g 1 , g l ) = λ 21

Hence, from cov(g 1 , g 2 ) = λ 21 σ 2 θ 1

, we obtain σ 2 θ 1

, **an**d hence λ l1 ,l= 1,...,L. We

c**an** proceed to the next set **of** two measurements **an**d identify

cov(g l , g j ) = λ l1 λ j1 σ 2 θ 1

+ λ l2 λ j2 σ 2 θ 2

, l = 3, 4; j ≥ 3; j ≠ l

Since we know the first term on the right-h**an**d side by the previous argument,

we c**an** proceed using cov(g l , g j ) − λ l1 λ j1 σ 2 θ 1

**an**d identify the λ j2 , j = 1,...,L using

the previous line **of** reasoning (some **of** these elements are fixed to zero).

Proceeding in this fashion, we c**an** identify **an**d θ subject to diagonal normalizations.

This argument works for all but the system for the Kth **an**d final factor.

Observe that for all **of** the preceding factors there are at least three measurements

that depend on θ j , j = 1,...,K − 1, although only two **of** the measurements

need to depend solely on θ 1 ,...,θ K−1 , respectively. To obtain the necessary three

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 375

measurements for the Kth **an**d final factor, we require that there be at least three

outcomes **with** measurements that depend on θ 1 ,...,θ K .

Knowing **an**d θ , we c**an** identify D ε . Use **of** dedicated measurement systems

for specific factors **an**d p**an**el data helps to eliminate much **of** the arbitrariness

that plagued factor **an**alysis during its introduction in economics in the 1970s.

Although m**an**y other restrictions on the model are possible, the one we adopt has

the adv**an**tage **of** simplicity **an**d interpretability in m**an**y contexts. 18

Our **an**alysis uses a version **of** (19), coupled **with** the exclusion restrictions exemplified

in (21), to identify the joint distributions **of** counterfactuals. We extend

conventional factor **an**alysis in three ways. First, following Heckm**an** (1981) **an**d

Muthen (1984), we allow the G to include latent index functions like I (associated

**with** the choice equations) or like Ys,a ∗ as well as their m**an**ifestations (the r**an**dom

variables they generate). Thus, the G may include discrete or censored r**an**dom

variables generated by latent r**an**dom variables. We c**an** identify components associated

solely **with** the discrete case only up to unknown scale factors—the familiar

indeterminacy in discrete choice **an**alysis. Choice indices, measurements, **an**d state

contingent outcomes are all informative on θ. The factor **an**alysis in this article

is conducted on the latent continuous variables that generate the m**an**ifest outcomes.

Second, we extend factor **an**alysis to a case **with** counterfactuals where

certain variables are only observed if state s is observed. This extension enables

us to identify the full joint distribution **of** counterfactuals. Third, we prove nonparametric

identification **of** the distributions **of** θ **an**d ε, **an**d do not rely on **an**y

normality assumptions.

18 Other normalizations are possible. All require that there be at least three measurements on

each factor, although we c**an** get by **with** only one dedicated measurement. Consider the following

example (due to Salvador Navarro): Let L = 5, K = 2.

g 1 = θ 1 + ε 1 , g 2 = λ 21 θ 1 + θ 2 + ε 2

g 3 = λ 31 θ 1 + λ 32 θ 2 + ε 3 , g 4 = λ 41 θ 1 + λ 42 θ 2 + ε 4

g 5 = λ 51 θ 1 + λ 52 θ 2 + ε 5

Assuming nonv**an**ishing covari**an**ces **an**d factor loadings,

λ 32 = cov(g 1, g 5 )cov(g 3 , g 4 ) − cov(g 3 , g 5 )cov(g 1 , g 4 )

cov(g 2 , g 4 )cov(g 1 , g 5 ) − cov(g 1 , g 4 )cov(g 2 , g 5 )

if λ 22 λ 42 λ 51 − λ 41 λ 52 ≠ 0. Then

if λ 31 ≠ λ 32 λ 21 .

λ 41 = cov(g 3, g 4 ) − cov(g 2 , g 4 )λ 32

cov(g 1 , g 3 ) − cov(g 1 , g 2 )λ 32

λ 21 = cov(g 1, g 2 )λ 41

cov(g 1 , g 4 )

,λ 31 = cov(g 1, g 3 )λ 41

,λ 51 = cov(g 1, g 5 )λ 41

cov(g 1 , g 4 ) cov(g 1 , g 4 )

σ 2 θ 1

= cov(g 1, g 4 )

λ 41

,σ 2 θ 2

= cov(g 2, g 3 ) − λ 21 λ 31 σ 2 θ 1

λ 32

λ 42 = cov(g 2, g 4 ) − λ 21 λ 41 σ 2 θ 1

σ 2 θ 2

,λ 52 = cov(g 2, g 5 ) − λ 21 λ 51 σ 2 θ 1

σ 2 θ 2

376 CARNEIRO, HANSEN, AND HECKMAN

5. IDENTIFICATION OF SEMIPARAMETRIC FACTOR MODELS WITH DISCRETE

CHOICES AND DISCRETE AND CONTINUOUS OUTCOMES

In order to establish identification, we need to be clear about the raw data

**with** which we are working. For each set **of** s-contingent potential outcomes, there

is a system like (19): ˜G s = (M, Y s , D s ) where Y s is a vector **of** state contingent

outcomes. Outcome variables in ˜G s are **of** two types: (a) continuous variables

**an**d (b) discrete or censored r**an**dom variables, including binary strings associated

**with** durations (e.g., unemployment). When the r**an**dom variables are discrete

or censored, we work **with** the latent variables generating them. We array the

continuous portions **of** ˜G s **an**d the index functions generating the discrete portions

into G s .

Let M c denote the continuous measurements, **an**d let Ys

c be the continuous

counterfactual outcomes. Let M d be the discrete components **of** M, whereas Ys

d

are the discrete components **of** Y s . Table 1 defines the variables used in our **an**alysis.

Under separability the continuous variables c**an** be written as

M c = µ c m (X) + Uc m

Y c

s = µc s (X) + Uc s

Associated **with** the discrete variables are latent continuous variables

M ∗d = µ d m (X) + Ud m

Y ∗d

s

= µ d s (X) + Ud s

where Um d, Ud s are assumed to be continuous.19 The indicator variable is generated

by latent variable I as defined in (17).

The data used for the factor **an**alysis are G s = (M c , M ∗d , Ys c,

Y s

∗d , I). For simplicity,

in this article we assume that the “discrete” variables are in fact binary valued.

Extensions to censored r**an**dom variables **an**d to binary strings are straightforward,

**an**d are developed in a later article. We observe ˜G s when D s = 1. For each s, we

have a system **of** outcome variables. Although the outcomes are s-dependent, the

measurements are observed independently **of** the value assumed by D s .

The distinction between measurements (M ) whose values do not depend on

the value assumed by D s **an**d the state contingent outcomes Y s that depend on the

state s that is observed is essential. There is no selection bias in observing M, but

in general there is a selection bias in observing Y = ∑ S

s=1 D sY s .

M, Y, **an**d D s , s = 1,...,S all contain information on θ. The information from

M is easier to access, **an**d traditional factor **an**alysis is based on such measurements.

Nonetheless, the identification **of** counterfactual states does not require M. IfM

is available, however, the interpretation **of** θ is more tr**an**sparent.

Before turning to our factor **an**alysis, we first establish conditions under which

we c**an** identify the joint distribution **of** M c , M ∗d , Y c , I, which constitute the

s , Y∗d s

19 In particular, Um d, Ud s are assumed to have a distribution that is absolutely continuous **with** respect

to the Lebesgue measure.

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 377

TABLE 1

COMPONENTS OF

˜

G s

Continuous

Discrete

Variables defined to be the same for all s M c M d

Variables defined for s Y c s Y d s

Indicator **of** state — D s

data for the factor **an**alysis. To underst**an**d the basic ideas, we break this task into

three parts: (a) identification **of** the joint distribution **of** (M c , M ∗d ), (b) identification

**of** the parameters in choice system (17) **an**d (18), **an**d (c) identification **of** the

full joint distribution **of** (M c , M ∗d , Y c

factor **an**alyzed.

We assume that

s , Y∗d s

, I). This full distribution is subsequently

(A-1) (Um c , Ud m , Uc s , Ud s ,ε W) have distribution functions that are absolutely continuous

**with** respect to Lebesgue measure **with** me**an**s zero 20 **with**

support U c m × Ud m × Uc s × Ud s × E W **with** upper **an**d lower limits being

Ūm c , Ūd m , Ūc s , Ūd s , ε W **an**d U c m , Ud m , Uc s , Ud s ,ε W , respectively, which may be

bounded or infinite. Thus the joint system is measurably separable (variation

free). 21 We assume finite vari**an**ces. 22 The cumulative distribution

function **of** ε W is assumed to be strictly increasing over its full support

(ε W , ε W ).

(A-2) (X, Z, Q) ⊥⊥ (U,ε W ) where U = (Um c , Ud m , Uc s , Ud s ), **an**d where Q is a vector

**of** state-specific regressorsQ= (Q 1 ,...,Q S ) .

We denote by “∼” normalized values where the normalizations in our context

are usually st**an**dard deviations **of** latent index errors. We first consider identification

**of** the joint distribution **of** M. Our results are contained in Theorem 1.

THEOREM 1. From data on F(M | X), one c**an** identify the joint distribution

**of** (Um c , Ud m ) (the latter component only up to scale), the function µd m (X)

is identified (up to scale) **an**d µ c m (X) is identified over the support **of** X provided

that the following assumptions, in addition to the relev**an**t components **of**

(A-1) **an**d (A-2), are invoked:

(A-3) Order the discrete measurement components to be first. Suppose that there

are N m,d discrete components, followed by N m,c continuous components.

Assume Support (µ d 1,m (X),...,µd N m,d ,m (X)) ⊇ Support(Ud 1,m ,...,Ud N m ,m ).

Conditions (A-1) **an**d (A-3) imply that (µ d 1,m (X),...,µd N m,d ,m(X)) is measurably

separable (variation free) in all **of** its coordinates when “⊇” is replaced by “=”.

20 Alternatively, we could normalize the medi**an**s to be zero.

21 Foradefinition **of** measurable separability, see Florens et al. (1990, Subsection 5.2). The key idea

is that we c**an** vary each **of** the coordinates **of** the vector freely.

22 This assumption c**an** be relaxed. It only affects certain normalizations.

378 CARNEIRO, HANSEN, AND HECKMAN

(A-4) For each l = 1,...,N m,d , µ d l,m (X) = Xβd l,m .

(A-5) The X lives in a subset **of** R N X

, where N X is the dimension **of** the X regressors.

There exists no proper linear subspace **of** R N X

having probability 1

under F X , the distribution function **of** X.

PROOF. See Appendix A.

Condition (A-4) is conventional (see Cosslett, 1983; M**an**ski, 1988). Weaker

conditions are available using the **an**alysis **of** Matzkin (1992, 1993). Support

condition (A-3) appears in Cameron **an**d Heckm**an** (1998) **an**d Aakvik et

al. (1999). The easiest way to satisfy it is to have exclusions: one continuous

component in µ

l,m d (X) that is not **an** argument in the others. But that

is only a sufficient condition. Even **with**out exclusion, this condition c**an** be

satisfied if there are enough continuous regressors in X **an**d the µ

l,m d (X)

have a full r**an**k Jacobi**an**—**with** respect to the derivatives **of** the continuous

(X) variables. Intuitively, if the r**an**k condition is satisfied, we c**an** hold

µ

l,m d (X) at ¯µd l,m

**an**d vary the other arguments. Formally, this r**an**k condition

requires that if we array the coefficients **of** the continuous variables

coefficients **of** βl,m d ,˜β l,m d , into a N m,d by N X matrix, where here N X is the

number **of** continuous components **of** X, that R**an**k{˜β

l,m d }N m,d

l=1 ≥ N m,d. This requires

N m,d continuous variables. It also requires that the coefficients are linearly

independent. If the number **of** continuous components is N X < N m,d ,

we c**an** only identify N X components **of** the distribution **of** Um d . See Cameron

**an**d Heckm**an** (1998), Aakvik et al. (1999), or H**an**sen et al. (2003) for

more discussion **of** this case **of** identification **with**out conventional exclusion

restrictions.

We next turn to identification **of** the generalized ordered discrete choice model

(17). This extends the pro**of** in Cameron **an**d Heckm**an** (1998) by parameterizing

the cut points. A more general version **of** this model appears in Heckm**an** **an**d

Navarro (2001).

THEOREM 2. For the relev**an**t subsets **of** the conditions (A-1) **an**d (A-2) (specifically,

assuming absolute continuity **of** the distribution **of** ε W **with** respect to Lebesgue

measure **an**d ε W ⊥⊥ (Z, Q)) **an**d the additional assumptions:

(A-6) c s (Q s ) = Q s η s , s = 1,...,S,ϕ(Z) = Z ′ β

(A-7) (Q 1 , Z) is full r**an**k (there is no proper subspace **of** the support (Q 1 , Z)

**with** probability 1). The Z contains no intercept.

(A-8) Q s for s = 2,...,S is full r**an**k (there is no proper subspace **of** (R Q s

) **with**

probability 1).

(A-9) Support (c, (Q 1 ) − ϕ(Z)) ⊇ Support (ε W )

Then the distribution function F εW is known up to a scale normalization

on ε W **an**d c s (Q s ), s = 1,...¯s, **an**d ϕ(Z) are identified up to a scale

normalization.

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 379

PROOF. See Appendix A.

Our choice system c**an** be made more nonparametric using the type **of** restrictions

introduced in Matzkin, although we eschew that generality here. Matzkin

**an**d Lewbel (2002) weaken (A-6) generalizing the **an**alysis **of** Matzkin (1992) assuming

that the c s are const**an**ts.

We next turn to the identification **of** the joint system (M c , M ∗d , Ys c,

Y∗d s , I). The

data for each choice system (including the data on choice probabilities) generate

the left-h**an**d side **of**

(22) Pr ( M c ≤ m c , M ∗d ≤ 0, Y c

s

× Pr(D s = 1 | Z, Q s , Q s−1 )

≤ y c s , Y∗d s ≤ 0 | D s = 1, X, Z, Q s , Q s−1

)

=

∫ m c −µ c m (X ) ∫ −˜µ d

m (X )

U

˜

c U d m

∫ y c

s −µ c s (X ) ∫ −˜µ d (X )

U c s

× dU c m dŨd m dUc s dŨd s d˜ε W.

˜

U d s

∫ cs (Qs )−ϕ(Z )

σ W

c s−1 (Q s−1 )−ϕ(Z )

σ W

f ( U m,Ũd c Uc s,Ũd s,˜ε )

W

From Theorem 1 we know µ c m (X)(=Xβc m ) **an**d ˜µd m (X)(=X˜β m d ) **an**d the joint

distribution **of** (Um c , Ũd m ). From Theorem 2, we know c s(Q s )−ϕ(Z )

σ W

= Q sη s −Z ′ β

σ W

, s =

1,...,S **an**d the coefficients η s ,β **an**d the distribution F˜εW . Notice that c s (Q s ) ≥

c s−1 (Q s−1 ) is a requirement **of** the ordered choice model. We maintain the following

assumptions:

(A-10) Support(−˜µ d m (X), −˜µd s (X), ( c s(Q s ) − ϕ(Z)

σ W

− c s−1(Q s−1 ) − ϕ(Z)

σ W

)) ⊇

Support(U d m , Ud s ,˜ε W) = (U d m × Ud s × Ẽ W ).

(A-11) There is no proper linear subspace **of** (X, Z, Q s , Q s−1 ) **with** probability

one so the model is full r**an**k.

As a consequence **of** (A-6) **an**d (A-10) we c**an** find values **of** Q s , Q s−1 , ¯Q s , Q s−1

,

respectively, so that

lim

Qs → ¯Qs

Q s−1 →Q s−1

Pr (D s = 1 | Z, Q s , Q s−1 ) = 1

In these limit sets (which may depend on Z), under the stated conditions (A-1)–

(A-11), we c**an** identify the joint distribution **of** (M c , M ∗d , Ys c,

Y∗d s ), s = 1,...,S

using **an** argument parallel to the one used to prove Theorem 1. These limit

sets produce S different joint distributions (corresponding to each value **of** s)

but do not generate joint distributions across the s (i.e., the joint distribution

**of** M c , M ∗d , Ys c,

Y∗d s across s values). However, M is common across these systems.

Using the dependence **of** M **an**d Y s , s = 1,...,S, on a common θ we c**an**

380 CARNEIRO, HANSEN, AND HECKMAN

sometimes identify the joint distribution. See Carneiro et al. (2001) for **an** example.

Thus, **with** a measurement system M we do not strictly require information

on the choice index I to identify the model.

Following **an** argument **of** Heckm**an** (1990), Heckm**an** **an**d Honoré (1990), **an**d

Heckm**an** **an**d Smith (1998), we c**an** identify µ c s (X) up to **an** additive const**an**t **with**out

passing to the limit set where Pr(D s = 1 | Z, Q s , Q s−1 ) = 1. This is not possible

for the identification **of** ˜µ d s (X) because there is no counterpart to the variation in

ys c for the discrete component. This is the content **of** the following theorem that

combines the key ideas **of** Theorems 1 **an**d 2 to produce **an** identification theorem

for the general case.

THEOREM 3. Under assumptions (A-1), (A-2), (A-4), **an**d (A-6)–(A-11), µ c m (X),

µ c s (X), ˜µd m (X), ˜µd s (X),˜ϕ(Z), c s(Q s ) s = 1,...,S − 1 are identified as is the joint

distribution F(Um c , Ũd m , Uc s , Ũd s ,˜ε W).

PROOF. See Appendix A.

As noted in the discussion following Theorem 1, **with**out st**an**dard exclusion

restrictions we may only be able to identify subcomponents **of** the joint distribution

if N X < N m,d where N X is the number **of** continuous regressors. Note that the µ c s,l ,

˜µ d s,l

may only be defined over their supports. Under **an** additional r**an**k or variationfree

condition on the regressors we recover these functions everywhere over the

support **of** X.

5.1. Factor Analysis. The thrust **of** Theorems 1–3 is that under the stated

conditions we know the joint distributions **of** (U s , U m , ˜ε W ) s = 1,...,S where U s =

(Us d, Uc s ). We factor **an**alyze them under assumptions like those invoked in matrix

(21) **with** two or more **of** these elements dependent solely on θ 1 , **an** additional

two or more elements dependent solely on (θ 1 ,θ 2 ), **an**d so forth, but at least three

final elements dependent on θ K . There is a total **of** A× R outcomes in each state

where R is the number **of** outcome measures in each state at each age (e.g., wages,

employment, occupation), there are M non-state-contingent measurements, **an**d

˜ε W is a scalar. Thus, L in (21) is (A× R) + M + 1 in dimension for each system

s, s = 1,...,S.

We write the unobservables in factor structure form

U s,a = α s,a ′ θ + ε s,a

U m = α m ′ θ + ε m

**with** s = 1,...,S; a = 1,...,A

**with** m = 1,...,N m

˜ε W = γ ′ θ + ε I

The α s,a may be different across s-states so that each s system may depend on

different elements **of** θ. Theα m are not, nor is the γ . There may be multiple

measurements **of** outcomes so in principle α s,a may be a matrix **an**d ε s,a a vector **of**

mutually independent components. Our empirical **an**alysis is for the vector case.

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 381

The choice **of** how to select the blocks **of** (21) may appear to be arbitrary, but in

m**an**y applications there are natural orderings. Thus in the empirical work reported

below we estimate a two-factor model. We have a vector **of** five test scores that

proxy latent ability (θ 1 ). The state contingent outcomes (earnings) equations **an**d

choice equations plausibly depend on both θ 1 **an**d θ 2. In m**an**y applications there

are **of**ten natural allocations **of** factors to various measurements. However, to

avoid arbitrariness, a carefully reasoned defense **of** **an**y allocation is required. We

now formalize identification in this system.

THEOREM 4. Under the normalizations on the factor loadings **of** the type in (21)

for one system s under the conditions **of** Theorems 1–3, given the normalizations

for the unobservables for the discrete components **an**d given at least 2K + 1 measurements

(Y, M, I), the unrestricted factor loadings **an**d the vari**an**ces **of** the factors

(σ 2 θ i

, i = 1,...,K) are identified for all systems.

PROOF. The pro**of** is implicit in the discussion surrounding equation (21).

Observe that since the σθ 2 i

, i = 1,...,K, are identified in one system, normalizations

**of** specific factor loadings to unity are only required in that system since

we c**an** apply the knowledge **of** these vari**an**ces to the other systems. 23 Thus, for

the other systems (values **of** the state other th**an** s) we do not need to normalize

**an**y factor loading to unity.

We c**an** also nonparametrically identify the densities **of** the uniquenesses **an**d

the factors. This follows from mutual independence **of** the θ i , i = 1,...,K, **an**d

**an** application **of** Kotlarski’s Theorem (1967). We first state Kotlarski’s Theorem

**an**d then we apply it to our problem.

Write ({U m } N m

m=1 , {U s,a}

a=1 A ,˜ε W) in vector form as T s . Order the vectors so that

the first B 1 (≥2) elements depend only on θ 1 , the next B 2 − B 1 (≥2) elements

depend on (θ 1 ,θ 2 ) **an**d so forth. Let T1 s **an**d Ts

2 be the first two elements **of** Ts .

(This is purely a notational convenience). We order the elements **of** T s so that the

first block depends solely on θ 1 , (assuming that there are B 1 such measurements)

the second block depends solely on θ 1 ,θ 2 (there are B 2 − B 1 such measurements),

**an**d so forth, following the convention established in Equation (21). We require

B 1 ≥ 2, B 2 − B 1 ≥ 2, **an**d B K − B K−1 ≥ 3.

THEOREM 5.

If

T s

1 = θ 1 + v 1

**an**d

T s

2 = θ 1 + v 2

**an**d θ 1 ⊥⊥ v 1 ⊥ v 2 , the me**an**s **of** all three generating r**an**dom variables are finite,

E(v 1 ) = E(v 2 ) = 0, **an**d the conditions **of** Fubini’s theorem are satisfied for each

23 In the discussion **of** equation (21) we could have normalized the vari**an**ces **of** the σ 2 θ i

, i = 1,...,K,

to one rather th**an** certain factor loadings, although this is less straightforward **an**d requires the imposition

**of** certain sign restrictions.

382 CARNEIRO, HANSEN, AND HECKMAN

r**an**dom variable, **an**d the r**an**dom variables possess nonv**an**ishing (a.e.) characteristic

functions, then the densities **of** (θ 1 ,v 1 ,v 2 ), g(θ 1 ), g 1 (v 1 ), g 2 (v 2 ), respectively,

are identified.

PROOF. See Kotlarski (1967). See also Rao (1992).

Applied to our context, consider the first two equations **of** T **an**d suppose that

the components depend only on θ 1 . We use our notation for the factor loadings to

write

T s

1 = λs 11 θ 1 + ε s 1

where λ s 11 = 1

T s

2 = λs 21 θ 1 + ε s 2

where λ s 21 ≠ 0

Here we use a notation associating the subscript **of** εi s **with** its position in the T s

vector. Applying Theorem 4, we c**an** identify λ s 21

(subject to the normalization

λ s 11 = 1).24 Thus we c**an** rewrite these equations as

T s

1 = θ 1 + ε s 1

T2

s

λ s = θ 1 + ε ∗,s

2

21

where ε ∗,s

2

= ε2 s/λs 21

. Applying Kotlarski’s theorem, we c**an** nonparametrically

identify the densities g(θ 1 ), g 1 (ε1 s), **an**d g 2(ε ∗,s

2 ). Since we know λs 21

we c**an** identify

g(ε2 s). Let B 1 denote the number **of** measurements (elements **of** T s ) that depend

only on θ 1 . Proceeding through the first B 1 measurements, we c**an** identify g(εi s),

i = 1,...,B 1 .

Proceeding to equations B 1 + 1 **an**d B 1 + 2 (corresponding to the first two measurements

in the next set **of** equations that depend on θ 1 **an**d θ 2 ), we may use the

normalization adopted in Theorem 4 to write

T s B 1 +1 = λs B 1 +1,1 θ 1 + θ 2 + ε s B 1 +1

T s B 1 +2 = λs B 1 +2,1 θ 1 + λ s B 1 +2,2 θ 2 + ε s B 1 +2

Rearr**an**ging, we may write these equations as

T s B 1 +1 − λs B 1 +1,1 θ 1 = θ 2 + ε s B 1 +1

T s B+2 − λs B 1 +2,1 θ 1

λ s B 1 +2,2

= θ 2 + ε ∗,s

B 1 +2

24 To be able to identify λ s 21 we need a third measurement on this factor, which we c**an** get from

equation B 1 + 1. Since there is no equation B K+1 , we require that B K − B K−1 ≥ 3 in order to be able

to identify the loadings on θ K .

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 383

where ε ∗,s

B 1 +2 = εs B 1 +2

, **an**d the ε s λ s B B 1 +2,2

1 +1

**an**d ε∗,s

B 1 +2

are mutually independent. Hence

by Theorem 5, we c**an** identify densities g(θ 2 ), g(ε s B 1 +1 ), g(εs B 1 +2

). Exploiting the

structure (21), we c**an** proceed sequentially to identify the densities **of** θ,g(θ i ), i =

1,...,K, **an**d the uniqueness, g(εi s) for all the components **of** vector Ts . For the

components **of** εi s corresponding to discrete measurements, we do not identify the

scale. Armed **with** knowledge **of** the densities **of** the θ i **an**d the factor loadings

for other values **of** s, we c**an** apply st**an**dard deconvolution methods to nonparametrically

identify the uniqueness **of** the ε i ’s for the other systems. Thus, we c**an**

nonparametrically identify all the error terms for the model. Notice that, in principle,

we c**an** estimate separate distributions **of** the θ i for each s system **an**d thus

c**an** test the hypothesis **of** equality **of** these distributions across systems.

The essential idea in this article is to obtain identification **of** the joint counterfactual

distributions through the dependence across s **of** Y s = (Ys d,

Yc s ) on the common

factors that also generate M or I. In this sense, measurements **an**d choices are both

sources **of** identifying information, **an**d c**an** be traded **of**f in terms **of** identification.

We next apply our framework to a well-posed economic model.

6. GENERALIZING THE WILLIS–ROSEN MODEL

We revisit Willis **an**d Rosen’s application **of** the Roy model (1979) to the economics

**of** education, adding uncertainty **an**d nonpecuniary net returns to schooling,

**an**d identifying counterfactual distributions **of** gross **an**d net returns. In this

article the outcomes are utility outcomes, present value outcomes **an**d rates **of**

return.

Suppose that agents c**an**not lend or borrow **an**d possess log preferences

(utility = ln C, where C is consumption). Suppose that agents are choosing between

high school **an**d college so S = 2. The utility **of** attending college is

V(1) =

A∑

a=0

ln Y 1 a

(1 + ρ) a − ln P

where ln P is the “cost” **of** going to school. The costs include tuition costs **an**d the

psychic benefits from working in sector 1 (relative to sector 0). Thus, costs may be

negative. ρ is a subjective rate **of** time preference. The utility **of** completing only

high school is

V(0) =

A∑

a=0

ln Y 0

a

(1 + ρ) a

where Ya

0 **an**d Y1 a are earnings from high school **an**d college, respectively, at age

a. The psychic costs or benefits in logs for high school are normalized to zero. We

c**an** only identify relative psychic “costs” or benefits.

384 CARNEIRO, HANSEN, AND HECKMAN

Latent variables **an**d costs are generated by a factor structure. The equations

are

ln Y j

a = µ j (X) + ( α j a) ′θ + ε

j

a j = 0, 1, a = 1,...,A.

ln P = µ P (Z) + (α P ) ′ θ + ε P

In addition, we have measurements on test scores M = µ M (x) + α ′ M θ + εM , where

θ ⊥⊥ [(ε j

i,a )I i=1 ,1 j=0 ,A a=0 ,ε P].

The agent makes decisions about schooling under uncertainty about different

components **of** the model. I θ is the information set. The expected value V **of** going

to college is

⎡

A∑

V = E (V (1) − V (0) | I θ ) = E Iθ

⎢

⎣a=0

µ 1 a (X) − µ0 a (X) + ( α 1 a − α0 a) ′

θ + ε

1

a − ε 0 a

(1 + ρ) a

− [µ P (Z) + α ′ P θ + ε P]

If future innovations in earnings (εa 1,ε0 a ), a = 0,...,Aare not known at the time

schooling decisions are made but innovations in costs are known, we may write

the agent’s preference function as

V =

( A∑

a=0

) [

µ a 1 (X) − µ0 a (X)

A∑

(

α

1

(1 + ρ) a − µ P (Z) +

a − α 0 ′

]

a)

(1 + ρ) a − α ′ P E Iθ (θ) − ε P

a=0

As we shall see, this assumption about agent knowledge **of** future innovations in

earnings is testable. Assume that σ P = (var(ε P )) 1 2 < ∞. Then

( )

V

= 1 A∑

µ a 1 (X) − µ0 a (X)

σ P σ P (1 + ρ) a − µ P (Z)

a=0

+

( A∑

a=0

(

α

1

a − α 0 a) ′

(1 + ρ) a − α ′ P

)

1

σ P

E Iθ (θ) − ε P

σ P

D s = 1if V σ P

> 0; D s = 0 otherwise.

Specifying alternative information sets (I θ ) **an**d examining the resulting fit **of**

the model to data, we c**an** determine the information sets that agents act on.

Exact econometric specifications are presented in Section 7. We test whether

agents act on components **of** θ that also appear in outcome equations realized

after the choices are made. The estimated dependence between schooling choices

**an**d subsequent realizations **of** earnings enables us to identify the components in

⎤

⎥

⎦

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 385

the agent’s information set at the time schooling decisions are being made. This

extends the method **of** Flavin (1981) **an**d H**an**sen et al. (1991) to a discrete choice

setting. If agents do not act on these components, then those components are

intrinsically uncertain at the time agents make their schooling decisions unless

nongeneric c**an**cellations occur. 25 Because we c**an** identify the joint distributions

**of** unobservables, we c**an** **an**swer questions Willis **an**d Rosen could not such as:

(1) How highly correlated are latent skills (utilities) across sectoral choices? (2)

How much intrinsic uncertainty do agents face? (3) How import**an**t is uncertainty

in explaining schooling choices? (4) What fraction **of** the population regrets its ex

**an**te schooling choice ex post? We c**an** also separate out net psychic components

**of** the returns to schooling (the ln P ) from monetary components.

Observe that as a consequence **of** the log specification **of** preferences (including

the additive separability **of** the θ **an**d ε), me**an** preserving spreads in εa,θ j **an**d

ε P produce no ch**an**ge in me**an** utility. The probability **of** selection D s = 1 is also

invari**an**t to me**an** preserving spreads for εa j but not for θ **an**d ε P since their vari**an**ce

enters the choice probability if these components are known to the agent.

In addition, a me**an** preserving spread in ln Y is not the same as a me**an** preserving

spread in Y. Me**an** preserving spreads in Y have **an** effect on utility since

E(Y) = e µ E(e ε ). Define the residual from the me**an** as H, H = e µ e ε − e µ E(e ε )so

var(H) = e 2µ (E(e 2ε ) − [E(e ε )] 2 ). A me**an** preserving spread keeps the me**an** **of** Y

fixed at const**an**t k = E(Y) = e µ E(e ε ).

For a perturbation in the vari**an**ce **of** ε that ch**an**ges ε to ε, **an**d defining

f (ε) as the density **of** ε, locally 0 = dµ + [∫ εe ε f (ε) dε]

d so dµ =− [∫ εe ε f (ε) dε]

d.

E(e ε )

E(e ε )

Moreover, because E(ε) = 0 **an**d εe ε is convex increasing in ε, the derivative

is positive. In a log normal example, E(e ε ) = e σ 2

2 , E(e 2ε ) = e 2σ 2 , var(H) =

e 2µ (e 2σ 2 − e σ 2 ), k = e µ e σ 2

2 , ln k = µ + σ 2

2 , (−dµ) = d(σ 2 )

so **an** increase in the vari**an**ce

is equivalent to a decrease in the me**an** utility. We consider the effects **of**

2

me**an** preserving spreads on both me**an** log utility **an**d on the probability that V is

positive (college is selected). We now turn to the empirical **an**alysis **of** this article.

7. EMPIRICAL RESULTS

We use the NLSY data for white males described in Appendix B **an**d augmented

**with** the PSID data to estimate the Willis–Rosen Model. We focus on two schooling

decisions; graduating from a four-year college or graduating from high school. We

thus abstract from the full multiplicity **of** choices **of** schooling as did Rosen **an**d

Willis. This is clearly a bold simplification but it allows us to focus on the main

points **of** this article.

25 In principle, the future (αa 1,α0 a ) c**an** be uncertain at the date decisions are made. Assuming that

these factor loadings are independent **of** θ, we c**an** replace these expressions by E Iθ (αa 1), E I θ

(αa 0)

**with**out affecting the identifiability **of** (αa 1,α0 a ), provided the conditions **of** Theorem 4 are met, but

it affects the identifiability **an**d interpretation **of** α P . A more general version **of** this model would

postulate two r**an**dom variables θ **an**d θ ∗ . Agents act on θ ∗ whereas θ is the true value. It would be

interesting to identify the joint distributions **of** θ **an**d θ ∗ under, for example, a rational expectations

assumption. We leave this for a later occasion.

386 CARNEIRO, HANSEN, AND HECKMAN

As a measurement system (M) for cognitive ability we use five components

**of** the ASVAB test battery (arithmetic reasoning, word knowledge, paragraph

composition, math knowledge, **an**d coding speed). We dedicate the first factor

(θ 1 ) to the ability measurement system **an**d exclude the other factors from that

system (recall the normalizations in Equation (21)). We include family background

variables as additional covariates in the ASVAB test equations (the µ M (X)).

To simplify the empirical **an**alysis, we divide the lifetimes **of** individuals into two

periods. The first period covers ages 19–29, **an**d the second covers ages 30–65. We

compute **an**nual earnings by multiplying the hourly wage by hours worked each

year for each individual. 26 We impute missing wages **an**d project earnings for the

ages not observed in the NLSY data using the procedure described in Appendix

B. The NLSY data do not contain information on the full life cycle **of** earnings.

We project the missing NLSY earnings using estimates **of** lifetime earnings from

the PSID data.

Table 2a,b presents the sample statistics. They show that whereas college graduates

have higher earnings th**an** high school graduates, all **of** the gain **of** attending

college comes after age 30. College graduates also have much higher test scores

**an**d come from better family backgrounds th**an** high school graduates. They are

more likely to live in locations where a college is present **an**d where college tuition

is lower.

In the notation **of** Section 5, ¯S = 2 (two choices), ¯R = 1 (there is one outcome

per person, earnings), ¯M = 5 (there are five test scores that are generated solely by

θ 1 ), **an**d Ā = 2 (there are two periods in the life cycle). In addition, there is a utility

index I. The test scores depend solely on θ 1 . The outcomes **an**d index are allowed to

depend on (θ 1 ,θ 2 ). Since K = 2, assuming nonzero factor loadings, we satisfy the

conditions for identification presented in Theorem 4. We have five measurements

generated solely by θ 1 . There are three measurements generated by θ 1 **an**d θ 2 for

each schooling level. (Outcomes **an**d choices are defined for each choice system.)

Exclusion restrictions are given in Table 2c along **with** the specification **of** each

**of** the equations. Tuition **an**d family background identify the parameters **of** the

earnings equations. Local labor market variables identify the parameters **of** utility

equations. Assuming that test scores are continuous outcomes, no exclusions are

needed for identification **of** the test score equations **an**d their distribution.

In this section, to facilitate the exposition we denote the college state (choice

1) by c, whereas high school (choice 0) is denoted by h. We model log earnings

(utility **of** earnings) at each age as

(23)

ln Y a,s = δ a,s + X ′ β a,s + η 1,s × experience a + η 2,s × experience 2 a + α′ a,s θ + ε a,s

where Y a,s is earnings in period (age) a if the schooling level is s, X is a vector **of**

covariates, θ is a vector **of** factors **an**d η 1,s **an**d η 2,s are calculated by the procedure

described in Appendix B. We compute the present value **of** log earnings (lifetime

26 We set zero earnings to $1 in this article to simplify computations (ln(1) = 0).

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 387

TABLE 2a

DESCRIPTIVE STATISTICS OF VARIABLES (NLSY 79—WHITE MALES)

Overall High School College

Variables Obs Me**an** Std. Dev. Min Max Obs Me**an** Std. Dev. Min Max Obs Me**an** Std. Dev. Min Max

Tuition at age 17 1161 20.68 7.70 0.00 53.59 704 21.17 7.85 0.00 47.06 457 19.92 7.40 0.00 53.59

(hundreds **of** dollars)

Urb**an** at age 14 1161 0.74 0.44 0.00 1.00 704 0.69 0.46 0.00 1.00 457 0.82 0.38 0.00 1.00

Broken at age 14 1161 0.13 0.34 0.00 1.00 704 0.15 0.36 0.00 1.00 457 0.11 0.31 0.00 1.00

Number **of** siblings 1161 2.83 1.77 0.00 15.00 704 3.03 1.85 0.00 15.00 457 2.51 1.60 0.00 11.00

South at age 14 1161 0.19 0.39 0.00 1.00 704 0.19 0.39 0.00 1.00 457 0.19 0.39 0.00 1.00

Mother education 1161 12.39 2.20 3.00 20.00 704 11.69 1.86 3.00 20.00 457 13.47 2.26 6.00 20.00

Father education 1161 12.71 3.16 0.00 20.00 704 11.61 2.70 0.00 20.00 457 14.40 3.08 4.00 20.00

Age in 1980 1161 19.27 2.19 16.00 23.00 704 19.27 2.17 16.00 23.00 457 19.28 2.21 16.00 23.00

Dist**an**ce to college at age 17 1161 7.68 15.58 0.00 100.20 704 8.87 16.01 0.00 100.20 457 5.84 14.75 0.00 96.59

Dummy for birth in 1957 1161 0.10 0.30 0.00 1.00 704 0.10 0.31 0.00 1.00 457 0.09 0.29 0.00 1.00

Dummy for birth in 1958 1161 0.10 0.30 0.00 1.00 704 0.09 0.28 0.00 1.00 457 0.12 0.33 0.00 1.00

Dummy for birth in 1959 1161 0.11 0.31 0.00 1.00 704 0.11 0.31 0.00 1.00 457 0.10 0.30 0.00 1.00

Dummy for birth in 1960 1161 0.14 0.35 0.00 1.00 704 0.15 0.36 0.00 1.00 457 0.13 0.34 0.00 1.00

Dummy for birth in 1961 1161 0.14 0.34 0.00 1.00 704 0.14 0.34 0.00 1.00 457 0.13 0.34 0.00 1.00

Dummy for birth in 1962 1161 0.16 0.37 0.00 1.00 704 0.16 0.37 0.00 1.00 457 0.16 0.36 0.00 1.00

Dummy for birth in 1963 1161 0.13 0.34 0.00 1.00 704 0.12 0.33 0.00 1.00 457 0.14 0.35 0.00 1.00

Education status (0 if HS, 1161 0.39 0.49 0.00 1.00 704 0.00 0.00 0.00 0.00 457 1.00 0.00 1.00 1.00

1 if college)

In school at ASVAB test date 1161 0.67 0.47 0.00 1.00 704 0.49 0.50 0.00 1.00 457 0.94 0.23 0.00 1.00

Arithmetic reasoning 1161 0.15 0.95 −2.39 1.42 704 −0.22 0.91 −2.39 1.42 457 0.73 0.70 −1.96 1.42

Word knowledge (ASVAB 3) 1161 0.14 0.88 −3.71 1.16 704 −0.19 0.92 −3.71 1.16 457 0.64 0.50 −2.24 1.16

Paragraph composition 1161 0.14 0.89 −3.50 1.21 704 −0.17 0.96 −3.50 1.21 457 0.62 0.47 −1.62 1.21

(ASVAB 4)

Coding speed (ASVAB 6) 1161 0.15 0.95 −3.03 2.79 704 −0.12 0.90 −2.89 2.09 457 0.57 0.87 −3.03 2.79

Math knowledge (ASVAB 7) 1161 0.14 0.97 −2.14 1.58 704 −0.36 0.80 −2.14 1.58 457 0.91 0.68 −1.83 1.58

Present value **of** earnings ∗ 1161 956.13 730.87 18.12 7861.67 704 694.56 321.93 18.12 1885.85 457 1359.07 964.47 77.02 7861.67

∗ Earnings in thous**an**ds **of** dollars.

388 CARNEIRO, HANSEN, AND HECKMAN

TABLE 2b

DESCRIPTIVE STATISTICS—PRESENT VALUE OF LOG EARNINGS (DISCOUNT RATE = 3%, NLSY 79—WHITE MALES)

Overall High School College

Variables Obs Me**an** Std. Dev. Min Max Obs Me**an** Std. Dev. Min Max Obs Me**an** Std. Dev. Min Max

Present value 1161 956.13 730.87 18.12 7861.67 704 694.56 321.93 18.12 1885.85 457 1359.07 964.47 77.02 7861.67

**of** earnings

(working life) ∗

Present value 1161 157.25 72.02 3.91 509.21 704 162.46 74.66 3.91 457.00 457 149.22 67.04 13.55 509.21

**of** earnings

in Period 1 ∗

Present value 1161 798.88 694.55 14.21 7533.31 704 532.10 251.57 14.21 1582.89 457 1209.84 922.21 63.48 7533.31

**of** earnings

in Period 2 ∗

NOTE: ∗ Earnings in thous**an**ds **of** dollars.

Working life = From age 19–65.

Period 1 = From age 19 to 29, inclusive.

Period 2 = From age 30 to 65.

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 389

TABLE 2c

COVARIATES INCLUDED IN OUTCOME, COST AND TEST EQUATIONS

Cost **of** Schooling

Tuition **an**d Foregone Psychic

Earnings Earnings Cost Test Scores

Intercept Yes Yes Yes Yes

Urb**an** Yes Yes Yes Yes

South Yes Yes Yes Yes

Cohort dummies Yes Yes Yes –

Me**an** local unemployment rate Yes Yes – –

Average local wage Yes Yes – –

Local tution – Yes – –

Number **of** siblings – – Yes Yes

Mother’s education – – Yes Yes

Father’s education – – Yes Yes

Broken family – – Yes

Enrolled in school at test date – – Yes

Age in 1980 – – Yes

utility) in the first period (ages 19–29 years) **an**d in the second period (ages 30 to

65 years). Let V 1,s be the period 1 gross utility **of** achieving schooling level s, **an**d

V 2,s be the period 2 gross utility **of** obtaining schooling level s. Using (23), we write

the gross utilities as

V 1,s = ¯δ 1,s + X ′ ¯β 1,s + ᾱ ′ 1,s θ + ¯ε 1,s

V 2,s = ¯δ 2,s + X ′ ¯β 2,s + ᾱ ′ 2,s θ + ¯ε 2,s

These are the outcome equations for the model that we estimate. To see this, notice

that

V 1,s =

=

=

∑A 1

a=19

∑A 1

a=19

∑A 1

a=19

+

ln Y a,s

(1 + ρ) a

δ a,s + X ′ β a,s + α ′ a,s θ + ε a,s + η 1,s × experience a + η 2,s × experience 2 a

(1 + ρ) a

δ a,s + X ′ β a,s + η 1,s × experience a + η 2,s × experience 2 a

(1 + ρ) a

[

A1

∑

a=19

]

α a,s

′

(1 + ρ) a θ +

∑A 1

a=19

= ¯δ 1,s + X ′ ¯β 1,s + ᾱ ′ 1,s θ + ¯ε 1,s

ε a,s

(1 + ρ) a

390 CARNEIRO, HANSEN, AND HECKMAN

where

A 1 = 29

ρ = 0.03 (the prespecified discount rate)

¯δ 1,s =

¯β 1,s =

∑A 1

a=19

∑A 1

a=19

δ a,s + η 1,s × experience a + η 2,s × experience 2 a

(1 + ρ) a

β a,s

(1 + ρ) a

A 1

ᾱ

1,s ′ = ∑ α a,s

′

(1 + ρ) a

¯ε 1,s =

a=19

∑A 1

a=19

ε a,s

(1 + ρ) a

**an**d the terms for the second period **of** data for ages (30–65) are defined **an**alogously.

The cost or psychic net return **of** going to college is written as

ln P = δ P + Z ′ γ + α ′ P θ + ε P

These costs c**an** be negative as they entail both psychic **an**d tuition components.

Assuming that the agents know X, Z,θ **an**d ε P , the criterion for the choice **of**

schooling is

V = E (V 1,c + V 2,c − V 1,h − V 2,h | X,θ) − E(ln P | Z, X,θ, ε P )

= ¯δ 1,c + X ′ ¯β 1,c + ᾱ

1,c ′ θ + ¯δ 2,c + X ′ ¯β 2,c + ᾱ

2,c ′ θ − ¯δ 1,h − X ′ ¯β 1,h − ᾱ

1,h ′ θ − ¯δ 2,h

− X ′ ¯β 2,h − ᾱ

2,h ′ θ − δ P − Z ′ γ − α ′ P θ − ε P

= (¯δ 1,c + ¯δ 2,c − ¯δ 1,h − ¯δ 2,h − δ P ) + X ′ ( ¯β 1,c + ¯β 2,c − ¯β 1,h − ¯β 2,h ) − Z ′ γ

+ (ᾱ

1,c ′ + ᾱ′ 2,c − ᾱ′ 1,h − ᾱ′ 2,h − α′ P )θ − ε P

Individuals go to college if V > 0. We test (**an**d do not reject) the hypothesis that at

the time they make their college decision agents know their cost function **an**d both

factors, θ, but not the uniquenesses in the outcome equations. These expressions

c**an** be modified in **an** obvious way to accommodate other information sets.

The test score equations have a similar structure. Let T j be test score j,

T j = X ′ T ω j + α ′ test j

θ + ε test j

where X T is the vector **of** covariates in the test score equation, **an**d ω j is the

coefficient vector. The distributions **of** the θ **an**d ε are nonparametrically identified

under the assumptions supporting Theorems 1–5. In this article, we assume that

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 391

each factor is generated by a mixture **of** normals distribution,

(24)

θ k ∼

∑J k

j=1

p k, j φ ( f k | µ j,k ,τ j,k ) ,

k = 1,...,K

Mixtures **of** normals **with** a large enough number **of** components approximate **an**y

distribution **of** θ k **an**d the ε arbitrarily well (Ferguson, 1983). We assume that the

ε’s are normal although, in principle, they are nonparametrically identified from

the **an**alysis **of** Theorem 5.

We estimate the model using Markov Chain Monte Carlo methods as described

in Appendix C for 55,000 iterations, discarding the first 5000 iterations to allow

the chain to converge to its stationary distribution. We retain every 10th **of** the

remaining 50,000 iterations for a total **of** 5000 iterations. 27 The Markov Chain

mixes well **with** most autocorrelations dying out at around lag 25 to 50.

We estimate models **with** one factor **an**d **with** two factors. The estimated coefficients

are presented as Tables A1 through A5 in the supplementary tables

on the website (http://lily.src.uchicago.edu/CHH estimating.html). The two-factor

model specifies that the first factor only appears in test scores **an**d choice equations

whereas the second factor appears in all equations. No additional factors are

necessary to fit our data. Thus, we conclude that the innovations in the earnings

process (ε j a) are not in the agent’s information set at the time schooling decisions

are made. If they were, they would be **an** additional source **of** covari**an**ce (i.e.,

they would generate additional factors) between the choice equation **an**d future

earnings. If we use only one factor that enters in all equations, the quality **of** the fit

is much poorer (results available on request). From this testing procedure we infer

that agents know both components **of** θ at the time they enroll in college. Figure 1

shows the fit **of** the density **of** the present value **of** log earnings (or lifetime utility

**of** earnings excluding psychic costs **an**d benefits) for everyone in the population.

It graphs the actual **an**d predicted densities **of** gross utility. The fit is very good.

Results for each schooling group are available in the supplement on the website

**an**d are equally good (χ 2 goodness **of** fit tests are passed overall as well as for

the distribution **of** utility for each schooling group; see Table A6). In order to

achieve this good fit it is necessary to allow for nonnormal factors. Figure 2 shows

the density **of** each **of** the estimated factors **an**d compares them **with** a benchmark

normal **with** the same me**an** **an**d st**an**dard deviation. Neither factor is normal. 28

There is evidence **of** selection on ability (factor 1), **with** the less able less likely to

attend college. There is weaker evidence **of** selection on factor 2 (see Figures A1

**an**d A2 posted at the website).

Table 3a,b presents the factor loadings in the outcome, choice, **an**d measurement

equations. 29 Both factors have a positive effect on gross utility for both schooling

levels in each period **an**d on schooling attainment (the I). Factor 1 explains most

**of** the vari**an**ce in the test score system (see Table 3b) whereas factor 2 explains

27 The run time was about 122 minutes on a 1.2 GHz AMD Athlon PC.

28 The distributions **of** the factors by schooling level are shown in Figures A1 **an**d A2 on the website.

29 The coefficient estimates for the model are posted on the website.

392 CARNEIRO, HANSEN, AND HECKMAN

0.35

Actual

Predicted

0.3

0.25

Dens ity(Utility)

0.2

0.15

0.1

0.05

0

0 2 4 6 8 10 12

Utility

All densities are estimated using a 100 point grid over the domain **an**d a Gaussi**an** kernel **with** b**an**dwidth **of** 0.12.

Utility=

1

Σ a(1+0.03) a log(Y a,s )

FIGURE 1

DENSITY OF EX POST GROSS UTILITY

most **of** the vari**an**ce in the utility outcome system (see Table 3a). The return to

college in terms **of** gross utility (gross utility differences) is given by

V 1,c + V 2,c − V 1,h − V 2,h = (¯δ 1,c + ¯δ 2,c − ¯δ 1,h − ¯δ 2,h ) + X ′ ( ¯β 1,c + ¯β 2,c − ¯β 1,h − ¯β 2,h )

+ (ᾱ

1,c ′ + ᾱ′ 2,c − ᾱ′ 1,h − ᾱ′ 2,h )θ +(¯ε 1,c + ¯ε 2,c − ¯ε 1,h − ¯ε 2,h )

Both factors raise returns (see the base **of** Table 3a). Although the second factor

explains much more **of** the vari**an**ce in utility th**an** the first factor, the first factor

explains more **of** the vari**an**ce in returns th**an** the second factor, although it

only explains 30% **of** the vari**an**ce in returns. We infer that agents know θ (the

factors) based on the superior fit **of** a model that includes nonzero factor loadings

on both factors in the choice equation but not the innovations in outcomes (the

ε’s in the outcome equations) at the time they make their schooling decisions.

Our results indicate that the unpredictability in gross utility gains (i.e., differences)

**of** going to college is much larger th**an** the unpredictability in utility levels.

Both factors have a negative impact on “costs” (the factor loadings are negative in

the “cost” or psychic return function). Therefore, both factors positively influence

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 393

1

0.9

0.8

vari**an**ce = 0.3019

θ1

normal version **of** θ1

θ2

normal version **of** θ2

0.7

Density( )

θ

0.6

0.5

0.4

vari**an**ce = 0.5747

0.3

0.2

0.1

0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

θ

Normal densities are defined to be normal **with** same me**an** **an**d vari**an**ce as the corresponding θ.

All dens ities are es timated us ing a 100 point grid over the domain **an**d a G aus s i**an** kernel **with**

b**an**dwidth **of** 0.12.

FIGURE 2

FACTOR AND NORMAL DENSITIES

the likelihood **of** going to college since both contribute positively to returns **an**d

negatively to costs.

Figure 3 plots the estimated ex post factual **an**d counterfactual gross college

utility densities for college graduates **an**d high school graduates, respectively

(see Figure A3 on the website for the corresponding figure for high school utility).

College graduates have the highest level **of** gross utility both as high school graduates

**an**d as college graduates. They also have the highest gross gains **of** going to

college as demonstrated in Figure 4. 30,31 Figure 5 presents the marginal treatment

effect as defined in Equation (6) using utils as the outcome. This is the me**an** ex post

gross gain in utils **of** going to college as a function **of** ε W , which is **an** index **of** variables

that increase the likelihood **of** enrollment in college. It shows that individuals

30 If we consider net gains by subtracting costs, the difference between college graduates **an**d high

school graduates will be even higher because costs are lower for college graduates.

31 We c**an** also compute gross utility gains as a percentage **of** the gross utility in high school as

See Figure A4 on the website.

R = V 1,c + V 2,c

V 1,h + V 2,h

− 1.

394 CARNEIRO, HANSEN, AND HECKMAN

TABLE 3a

FACTOR LOADINGS

Utility

Factor Loading

St**an**dard Error

Potential first period θ 1 0.1419 0.0324

Utility in high school θ 2 1.0000 0.0000

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

0.3460 θ 1 = 0.0351 θ 2 = 0.8717

Potential second period θ 1 0.2277 0.0519

Utility in high school θ 2 1.6432 0.0262

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

0.8951 θ 1 = 0.0349 θ 2 = 0.9096

Potential first period θ 1 0.1888 0.0559

Utility in college θ 2 0.9402 0.0676

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

0.3455 θ 1 = 0.0634 θ 2 = 0.7718

Potential second period θ 1 0.3908 0.0979

Utility in college θ 2 1.7217 0.1203

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

1.0860 θ 1 = 0.0848 θ 2 = 0.8241

Total vari**an**ce for schooling ‘s’ at period ‘a’ = α 2 s,a,1 σ 2 θ1 + α2 s,a,2 σ 2 θ2 + σ 2 εsa .

Proportion **of** total vari**an**ce explained by factor ‘k’, in schooling ‘s’ at period ‘a’ = α 2 s,a,k σ 2 θ κ

/Total

vari**an**ce

Returns

Factor Loading

St**an**dard Error

(Ut. College- θ 1 0.2039 0.1523

Ut. High School) θ 2 0.0235 0.1892

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

0.2835 θ 1 = 0.1164 θ 2 = 0.0359

Total vari**an**ce = (α c,2,1 + α c,1,1 − α HS,2,1 − α HS,1,1 ) 2 σ 2 θ1 + (α c,2,2 + α c,1,2 − α HS,2,2 − α HS,1,2 ) 2 σ 2 θ 2

+

σ 2 εc,2 + σ 2 εc,1 + σ 2 ε HS,2

+ σ 2 ε HS,1

.

Proportion **of** total vari**an**ce explained by factor ‘k = (α c,2,k + α c,1,k − α HS,2,k − α HS,1,k ) 2 σθk 2 vari**an**ce.

/Total

who are likely to enroll in college have higher returns to college th**an** those who are

unlikely to enroll in college who have lower values **of** ε W . Figure 5 also shows the

distribution **of** ε W in the population. Most **of** the mass **of** this distribution is at values

**of** ε W around 0. M**an**y individuals have negative gross utility returns (excluding

psychic benefits **of** going to college). Even among those deciding to go to college,

39.53% would have higher utility (ignoring psychic components) had they not

gone to college. There is a definite fall in utility gains as college enrollment is exp**an**ded

to the less college prone. Table 4 confirms Figure 3 **an**d shows that college

graduates have higher ex post potential high school **an**d college utility th**an** high

school graduates in high school **an**d in college (these are gross utilities). Table 5

shows that the gross ex post returns **of** going to college are higher for those who

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 395

TABLE 3b

FACTOR LOADINGS

AFQT

Factor Loading

St**an**dard Error

Arithmetic reasoning θ 1 1.0000 0.0000 0.7391

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

0.7764 θ 1 = 0.7391

Coding speed θ 1 0.9672 0.0275 0.7308

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

0.7340 θ 1 = 0.7308

Math knowledge θ 1 0.6313 0.0350 0.2843

Total vari**an**ce Proportion **of** total vari**an**ce explained by

0.8049 θ 1 = 0.2843

Word knowledge θ 1 0.7508 0.0317 0.5219

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

0.6193 θ 1 = 0.5219

Paragraph composition θ 1 0.8080 0.0345 0.5301

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

0.7061 θ 1 = 0.5301

Total vari**an**ce for test ‘t’ = α 2 t,1 σ 2 θ1 + α2 t,2 σ 2 θ2 + σ 2 εt .

Proportion **of** total vari**an**ce explained by factor ‘k’ = αt,k 2 σ θk 2 /Total vari**an**ce.

Choice

Factor Loading

St**an**dard Error

Cost function ∗ θ 1 −2.1250 0.5042

θ 2 −1.0278 0.3799

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

0.8951 θ 1 = 0.0349 θ 2 = 0.9096

Choice ∗∗ θ 1 2.3349 0.4904

θ 2 1.0466 0.4277

Total vari**an**ce Proportion **of** total vari**an**ce explained by:

6.1544 θ 1 = 0.5297 θ 2 = 0.0604

∗ Cost = µ cost + α cost,1 θ 1 + α cost,2 θ 2 + ε cost .

∗∗ Choice = µ c,2 + µ c,1 − µ HS,2 − µ HS,1 + (α c,2,1 + α c,1,1 − α HS,2,1 − α HS,1,1 − α cost,1 ) θ 1 +

(α c,2,2 + α c,1,2 − α HS,2,2 − α HS,1,2 − α cost,2 ) θ 2 − µ cost − ε cost .

Total vari**an**ce **of** cost = α 2 cost,1 σ 2 θ1 + α2 cost,2 σ 2 θ2 + σ 2 ε cost

.

Proportion **of** total vari**an**ce **of** cost explained by factor ‘k’ = αcost,k 2 σ θk 2 /Total vari**an**ce **of** cost.

Total vari**an**ce **of** choice = (α c,2,1 + α c,1,1 − α HS,2,1 − α HS,1,1 − α cost,1 ) 2 σ 2 θ1 + (α c,2,2 + α c,1,2 −

α HS,2,2 − α HS,1,2 − α cost,2 ) 2 σ 2 θ2 + σ 2 εcost .

Proportion **of** total vari**an**ce explained by factor ‘k”=(α c,2,k + α c,1,k − α HS,2,k − α HS,1,k − α cost,k ) 2 σθ 2 k

/

Total vari**an**ce **of** choice.

choose to go to college. These results are expected given the pattern shown in

Figure 4. The returns for attending college for the average high school graduate

are negative. The ex post gross returns to college for the individual at the margin

(V = 0) are about 0.59% **of** total high school utility. Since these individuals are

exactly at the margin, these gains correspond exactly to the cost they are facing.

396 CARNEIRO, HANSEN, AND HECKMAN

0.09

0.08

High School*

College**

0.07

0.06

Density(Utility)

0.05

0.04

0.03

0.02

0.01

0

0 2 4 6 8 10 12

Utility

* Counterfactual: f( V c |Choice=High School)

** Predicted: f( V c |Choice=College)

All densities are estimated using a 100 point grid over the domain **an**d a Gaussi**an** kernel **with** b**an**dwidth **of** 0.12.

Utility=

1

Σ a(1+0.03) a log(Y a,s )

FIGURE 3

DENSITY OF EX POST COLLEGE GROSS UTILITY

Once we account for the nonmonetary costs **an**d benefits **of** going to college (net

returns reported in the bottom two rows **of** Table 5), the relative returns **of** going

to college become more negative for high school graduates **an**d more positive for

college graduates. Since ln P c**an** be allocated as either a cost or a return, there

are two ways to compute returns depending on whether ln P is treated as a cost

(row 2) or a return (row 3). We present two sets **of** net return estimates depending

on how costs or gains (ln P) are allocated. These are bounds since the actual

allocation between cost **an**d benefit is indeterminate.

The patterns **of** Figures 3–5 are essentially reproduced for ex post present

value **of** earnings in Figures 6–8. Table 6 shows that college graduates have earnings

57.6% higher th**an** they would have had (or $608,372 higher, on average) if

they had not attended college. High school graduates have a gross gain **of** 43%

(or $362,987) if they go to college. Notice that even though the utility gains **of**

going to college are negative for high school graduates, the money returns are

positive **an**d large. Table 7 shows that even though 39.66% **of** the persons going

to college would have had a higher utility in high school th**an** in college

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 397

TABLE 4

AVERAGE GROSS UTILITY IN DIFFERENT STATES (FACTUAL OR COUNTERFACTUAL) FOR PERSONS WHO GO TO

HIGH SCHOOL OR WHO GO TO COLLEGE AND FOR PEOPLE AT THE MARGIN

(Does Not Include Utility “Cost” or Psychic Returns to College)

Actual Schooling Level

Factual or Counterfactual

Utility for People

Schooling Level High School 1 College 2 at Margin 3

High school + 7.8580 8.6125 8.305

Std. error 0.0604 0.0737 0.1388

College ++ 7.7262 8.6885 8.305

Std. error 0.0638 0.0763 0.1388

1 + E(V h | choice = high school) **an**d 1 ++ E(V c | choice = high school).

2 + E(V h | choice = college) **an**d 2 ++ E(V c | choice = college).

3 + E(V h | V = 0) **an**d 3 ++ E(V c | V = 0).

0.7

0.6

High School*

College**

0.5

Density(Utility Differences)

0.4

0.3

0.2

0.1

0

-3 -2 -1 0 1 2 3

*f(V c -V h |Choice=High School)

Utility Differences

** f(V c -V h |Choice=College)

All densities are estimated using a 100 point grid over the domain **an**d a Gaussi**an** kernel **with** b**an**dwidth **of** 0.12.

1

Utility= Σ a(1+0.03) a log(Y a,s

)

FIGURE 4

DENSITY OF EX POST GROSS UTILITY DIFFERENCES (COLLEGE–HIGH SCHOOL)

(ignoring psychic gains), only 6.9% **of** this population had higher earnings in

high school th**an** in college. Once we account for psychic benefits, the proportion

**of** college students regretting their decisions is roughly the same whether we

measure regret in present value or utils. This shows the import**an**ce **of** accounting

398 CARNEIRO, HANSEN, AND HECKMAN

TABLE 5

FACTUAL AND COUNTERFACTUAL RETURNS FOR PERSONS WHO GO TO HIGH

SCHOOL OR COLLEGE

Gross Return High School 1 College 2

College vs. high school (relative) + −0.0180 0.0126

Std. error 0.1590 0.0178

Net returns

College vs. high school (relative) ++ −0.2398 0.3161

Std. error 0.2502 0.3178

College vs. high school (relative) +++ −0.4227 0.1892

Std. error 0.5770 0.0144

1 + E((V c /V h ) − 1 | choice = high school).

2 + E((V c /V h ) − 1 | choice = college).

1 ++ E((V c /V h − ln P)/(V h + ln P) | choice = high school).

2 ++ E((V c /V h − ln P)/(V h + ln P) | choice = college).

1 +++ E((V c /V h − ln P)/(V h ) | choice = high school).

2 +++ E((V c /V h − ln P)/(V h ) | choice = college).

NOTE: We make the distinction between the second **an**d third line in

this table because in our framework we c**an**not separate nonmonetary

costs from nonmonetary benefits **of** going to college, so we allocate

ln P both ways.

for psychic returns in **an**alyzing schooling choices. Among high school graduates,

95.90% do not regret not going to college (measured in utils), but 85.26%

regret the decision fin**an**cially. The marginal treatment effect has the same general

shape when present values **of** earnings are used instead **of** gross utility (see

Figure 8).

Table 8 shows the probability **of** being in decile i **of** the college potential discounted

earnings distribution conditional on being in decile j **of** the high school

potential earnings distribution. (These are gross earnings.) It shows that neither

**an** independence assumption across counterfactual outcomes, which is the Veil

**of** Ignor**an**ce assumption used in applied welfare theory, (see, e.g., Sen, 1973)

or in aggregate income inequality decompositions (DiNardo et al., 1996), nor a

perfect r**an**king assumption, which is sometimes used to construct counterfactual

joint distributions **of** outcomes (see, e.g., Heckm**an** et al., 1997 or Athey **an**d

Imbens, 2002) are satisfied in the data. There is a strong positive dependence

between potential outcomes in each counterfactual state, but there is no perfect

dependence. There are subst**an**tial nonzero elements outside the diagonal.

We get similar results for utility (discounted log earnings). See Table A13 at our

website.

We have already shown that there is a large dispersion in the distribution **of**

utilities, utility returns, earnings, **an**d earnings returns to college. However, this

dispersion c**an** be due to heterogeneity that is known at the time the agent makes

schooling decisions, or it c**an** be due to heterogeneity that is not predictable by the

agent at that time. Figure 9 plots the densities **of** the unforecastable component

**of** college gross utilities at the time college decisions are made for fixed X values,

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 399

1

MTE

Density( ε W )

0.04

Marginal Treatment Effect

0

0.02

-1

-10 -8 -6 -4 -2 0 2 4 6 8 10 0

ε W =(α' 1,c + α ' 2,c - α' 1,h - α' 2,h - α' p )θ - ε p

ε

W

**an**d a Gaussi**an** kernel **with** b**an**dwidth **of** 0.12.

Density( ε W )

FIGURE 5

DENSITY OF ε W AND MARGINAL TREATMENT EFFECT: E(V c − V h |ε W )

under three different information sets. (The X are fixed at their me**an**s.) The solid

line corresponds to the case where the agent does not know his factor (θ) nor his

innovations (the ε’s in the outcome equations). The other two lines correspond

respectively to the cases where the agent knows θ 2 only, or both θ 1 **an**d θ 2 . 32

Knowledge **of** θ 2 dramatically decreases the uncertainty faced, but knowledge **of**

factor 1 (associated **with** cognitive ability) has only a small effect on the amount **of**

uncertainty faced by the agent. We obtain a similar figure in terms **of** gross utility

in high school. 33 However, even though knowledge **of** θ 2 reduces dramatically

the amount **of** uncertainty faced in terms **of** the levels **of** gross utility in each

counterfactual state, it has only a small effect on the uncertainty faced in terms **of**

returns (see Figure 10). Table 9 reports the vari**an**ces **of** gross **an**d net utility **an**d

gross **an**d net present value **of** earnings under different information sets **of** agents.

Giving agents more information (knowledge **of** factors) reduces the vari**an**ce in

utilities or present values as perceived by agents. However, reducing uncertainty

32 If the agent knows θ 1 , θ 2 , ε college , **an**d ε high school then he faces no uncertainty.

33 These results are available on request from the authors, **an**d are posted on the website.

400 CARNEIRO, HANSEN, AND HECKMAN

1.4 x10-3

High School*

College**

1.2

1

Density(Earnings Differences)

0.8

0.6

0.4

0.2

0

-500 0 500 1000 1500 2000

Earnings Differences (1000's)

*f(PV c -PV h |Choice=High School)

**f(PV c -PV h |Choice=College)

PV

1

h = Σ a (1+0.03) a Y h,a

**an**d a Gaussi**an** kernel **with** b**an**dwidth **of** 0.12.

FIGURE 6

DENSITY OF EX POST GROSS LIFETIME EARNINGS DIFFERENCES (COLLEGE–HIGH SCHOOL)

barely budges the forecast returns to schooling measured in dollars or utils—the

message **of** Figure 10. Analogous results are obtained for present value **of** earnings.

See Figures A15 **an**d A16 posted at our website.

The fact that a two-factor model is adequate to fit the data implies that the agents

c**an**not forecast future shocks **of** log earnings (¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ) at the time they

make their schooling decision. (If they did, they would enter as additional factors

in the estimated model.) Even though the factors (θ) explain most **of** the vari**an**ce

in the levels **of** utilities, they explain less th**an** half **of** the vari**an**ce in returns, which

may lead the reader to conclude that the reason so m**an**y college graduates would

have higher gross utility in high school th**an** in college (39%) is because they c**an**not

accurately forecast their returns **of** going to college. However, this is not the case.

As shown in Table 7 once we account for psychic benefits or costs **of** attending

college (P) relative to attending high school, only 8% **of** college graduates regret

going to college. This suggests a subst**an**tial part **of** the gain to college is due to

nonpecuniary components. Furthermore, Table 10 shows that if individuals had

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 401

0.9

1

High School*

College**

0.8

0.7

Density(Returns)

0.6

0.5

0.4

0.3

0.2

0.1

0

-0.5 0 0.5 1 1.5 2 2.5

Returns

*f((PV c /PV h )-1|Choice=High School)

**f((PV c

/PV h

)-1|Choice=College)

PV

1

h = Σ a (1+0.03) a Y h,a

**an**d a Gaussi**an** kernel **with** b**an**dwidth **of** 0.12.

FIGURE 7

DENSITY OF EX POST RELATIVE GROSS EARNINGS DIFFERENCES (COLLEGE–HIGH SCHOOL)

knowledge **of** (¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ), keeping their average expected earnings the

same, very few **of** them would ch**an**ge their schooling decision. Uncertainty in

gains to schooling is subst**an**tial but knowledge **of** this uncertainty has a very small

effect on the choice **of** schooling because the vari**an**ce **of** gains is so much smaller

th**an** the vari**an**ce **of** psychic costs or benefits, **an**d it is the latter that drives most

**of** the heterogeneity in schooling decisions. In addition, there is uncertainty about

the level **of** both college **an**d high school earnings. See the vari**an**ces reported for

each in Table 9. The uncertainty in the return comes from both sources although

the literature emphasizes the uncertainty in college earnings. When conducting

this experiment, we make sure that the average expected earnings are the same

because a me**an** preserving reduction in the uncertainty faced by the agents in

terms **of** utility is not the same as a me**an** preserving ch**an**ge in uncertainty in terms

**of** levels **of** earnings (see Appendix D). 34 In particular, a ch**an**ge in the vari**an**ce **of**

(¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ) would not ch**an**ge the expected utility in each schooling level

34 See the numbers posted at the website.

402 CARNEIRO, HANSEN, AND HECKMAN

8

MTE

Density( ε W)

0.04

6

Relative Marginal Treatment Effect

4

0.02

2

0

-2

-10 -8 -6 -4 -2 0 2 4 6 8 10 0

Density( W )

ε

ε W

ε W =(α' 1,c + α ' 2,c - α' 1,h - α' 2,h - α' ' p )θ - ε p

All dens ities are es timated us ing a 100 point grid over the domain **an**d a G aus s i**an** kernel **with** b**an**dwidth **of** 0.12.

FIGURE 8

DENSITY OF ε w AND RELATIVE MARGINAL EX POST TREATMENT EFFECT FOR PRESENT VALUE OF GROSS

EARNINGS E((PV c /PV h ) − 1 | ε w )

but would ch**an**ge expected earnings in each schooling level. The numbers reported

in Table 10 take this into account. When agents know their (¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ),

they face less uncertainty. Knowing these components is equivalent to setting

= 0 in the expression at the end **of** Section 6, a special case **of** me**an** preserving

shrinkage where vari**an**ces are set to zero. The expected utility at each schooling

level increases. 35

8. SOME EVIDENCE ON AN EDUCATIONAL REFORM

Using the estimated model, we evaluate the effect **of** a full subsidy to college

tuition. We move beyond the Veil **of** Ignor**an**ce, which is based on **an** **an**onymity

35 We compute the compensation (which c**an** be negative or positive) required by each individual

to keep average earnings the same after the uncertainty is reduced. Then we provide the individual

**with** this compensation together **with** knowledge **of** (¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ) **an**d finally we compute

the percentage **of** individuals who would ch**an**ge their schooling decision if they had knowledge **of**

(¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ) but had the same present value **of** earnings in each schooling level. We use the

procedure described at the end **of** Section 6 applied to each period to adjust utility for the effects **of**

me**an** preserving spreads in earnings (see Appendix D).

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 403

TABLE 6

RETURNS TO COLLEGE IN TERMS OF LIFETIME EARNINGS (EXCLUDES NONPECUNIARY RETURNS) FOR PEOPLE

WHO GO TO HIGH SCHOOL, COLLEGE, OR ARE AT THE MARGIN

Earnings for People

High School 1 College 2 at Margin 3

Gross returns

College vs. high school + 0.4379 0.5764 0.5274

Std. error 0.0228 0.0365 0.0634

Net returns

College vs. high school ++ 0.4162 0.5607 0.5092

Std. error 0.0213 0.0366 0.0605

1 + E((PV c /PV h ) − 1 | choice = high school).

2 + E((PV c /PV h ) − 1 | choice = college).

3 + E((PV c /PV h ) − 1 | V = 0).

1 ++ E((PV c /(PV h +PV tuition )) − 1 | choice = high school).

2 ++ E((PV c /(PV h +PV tuition )) − 1 | choice = college).

3 ++ E((PV c /(PV h +PV tuition )) − 1 | V = 0).

PV j = ∑ a (1/(1 + 0.03)) a Y a,j , that is, the interest rate is 3%.

TABLE 7

PERCENTAGE OF PEOPLE WITH NEGATIVE RETURNS TO COLLEGE (NET AND GROSS)

Gross

Net ∗

Utility Earnings Utility Earnings

High school graduates 56.22% 13.62% 95.91% 14.74%

College graduates 39.66% 6.90% 8.32% 7.28%

∗ Net me**an**s net **of** total cost for utility **an**d net **of** tuition costs for earnings.

TABLE 8

Pr(d i < V c ≤ d i+1 | d j < V h ≤ d j+1 ) WHERE d i IS THE iTH DECILE OF THE PRESENT VALUE OF THE COLLEGE

EARNINGS DISTRIBUTION AND d j IS THE jTH DECILE OF THE PRESENT VALUE OF THE HIGH SCHOOL EARNINGS

DISTRIBUTION ∗

College Deciles

High school

Deciles 1 2 3 4 5 6 7 8 9 10

1 0.7436 0.1936 0.0459 0.0121 0.0035 0.0009 0.0003 0.0001 0.0000 0.0000

2 0.1846 0.3799 0.2372 0.1173 0.0503 0.0206 0.0072 0.0022 0.0006 0.0001

3 0.0482 0.2219 0.2640 0.2078 0.1337 0.0727 0.0344 0.0131 0.0036 0.0005

4 0.0154 0.1108 0.1944 0.2172 0.1902 0.1371 0.0806 0.0389 0.0134 0.0021

5 0.0055 0.0535 0.1240 0.1781 0.1986 0.1807 0.1372 0.0819 0.0341 0.0065

6 0.0019 0.0253 0.0732 0.1274 0.1706 0.1917 0.1826 0.1359 0.0740 0.0175

7 0.0006 0.0103 0.0382 0.0788 0.1271 0.1728 0.2011 0.1926 0.1357 0.0427

8 0.0001 0.0038 0.0171 0.0422 0.0802 0.1300 0.1816 0.2257 0.2173 0.1020

9 0.0000 0.0008 0.0053 0.0165 0.0379 0.0740 0.1288 0.2082 0.2919 0.2365

10 0.0000 0.0000 0.0006 0.0026 0.0079 0.0194 0.0465 0.1015 0.2294 0.5921

∗ Thus the number in row j column i is the probability that a person **with** potential high school earnings

in the jth decile **of** the high school earnings distribution has potential college earnings in the ith decile

**of** the college earnings distribution.

404 CARNEIRO, HANSEN, AND HECKMAN

FIGURE 9

DENSITY OF EX ANTE GROSS UTILITY UNDER DIFFERENT INFORMATION SETS

assumption **an**d evaluates reforms considering only their overall impact on inequality,

to consider those individuals that are benefited by the reform. We consider

only partial equilibrium treatment effects **an**d do not consider the full cost

**of** fin**an**cing the reforms. Table 4 shows the average lifetime gross utility **of** particip**an**ts

before the policy ch**an**ge **an**d Table 5 shows their prepolicy average return

to college. These tables compare these levels **an**d returns **with** what the marginal

particip**an**t attracted into schooling by the policy would earn. The marginal person

has lower utility in college **an**d lower returns to college th**an** the average person

in college (also see Figure 5). Since the policy affects the schooling decisions

**of** the individuals at the margin, the policy will produce a decline in the quality **of**

college graduates after the policy is implemented, since the new entr**an**ts are **of**

lower average quality th**an** the incumbents.

Despite the subst**an**tial size **of** the policy ch**an**ges we consider, the induced

effects on participation are small. The full tuition subsidy only increases graduation

from four-year college by 4%. 36 The policies operate unevenly over the deciles

**of** the initial outcome distribution. Figure 11 shows the proportion **of** high school

36 This comes from a simulation available on request from the authors.

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 405

FIGURE 10

DENSITY OF EX ANTE GROSS UTILITY DIFFERENCES (COLLEGE–HIGH SCHOOL) UNDER DIFFERENT

INFORMATION SETS

TABLE 9

AGENTS FORECAST VARIANCE OF COLLEGE-HIGH SCHOOL RETURNS UNDER DIFFERENT INFORMATION SETS FOR

THE AGENTS GROSS UTILITY

Var(W col − W hs ) Var(W col ) Var(W hs )

I =∅ 0.2836 2.5155 2.2747

I = {f 2 } 0.2726 0.3590 0.1666

I = {f 1 ,f 2 } 0.2355 0.1540 0.0815

We purge the effect **of** urb**an**, south, cohort dummies, average local unemployment rate **an**d average

wage on wages before computing these vari**an**ces.

persons in each decile **of** the high school present value **of** earnings distribution

induced to graduate from four-year college by the tuition subsidy. The figure

shows that providing a free college education mostly affects people at the top end

**of** the high school earnings distribution. 37 The policy does not benefit the poor. A

37 The same result holds when we consider distributions **of** utilities instead **of** distributions **of** lifetime

earnings. See Figure A15 on the website.

406 CARNEIRO, HANSEN, AND HECKMAN

TABLE 10

PEOPLE WHO CHOOSE DIFFERENTLY UNDER DIFFERENT INFORMATION SETS

COMPENSATING FOR THE CHANGE IN RISK

Original Choice

Fraction that Ch**an**ge Choice

Ĩ = {θ 1 , θ 2 , ε C , ε HS } Ĩ = {θ 1 }

High school 0.1181 0.1091

College 0.0159 0.0191

Total 0.0866 0.0813

0.1800

0.1600

0.1400

0.1200

Proportion

0.1000

0.0800

0.0600

0.0400

0.0200

0.0000

1 2 3 4 5 6 7 8 9 10

Decile

FIGURE 11

PROPORTION OF PEOPLE INDUCED INTO COLLEGE BY FULL SUBSIDY TO COLLEGE TUITION WHEN INFORMATION

SET = {θ 1 , θ 2 } BY DECILE OF INITIAL HIGH SCHOOL EARNINGS DISTRIBUTION

calculation based on the Veil **of** Ignor**an**ce using the Gini coefficient would show

no effect **of** the policy up to two decimal points. Our **an**alysis relaxes the Veil **of**

Ignor**an**ce, **an**d lets us study the impact **of** policies on persons at different positions

**of** the income distribution. It goes beyond the counterfactual simulations used in

the inequality literature (see, e.g., DiNardo et al., 1996) to account for self-selection

by agents into sectors in response to policy ch**an**ges.

9. SUMMARY AND CONCLUSIONS

This article uses low-dimensional factor models to generate counterfactual distributions

**of** potential outcomes. It extends matching by allowing some **of** the

variables that determine the conditional independence assumed in matching to

be unobserved by the **an**alyst. Semiparametric identification is established.

We apply our methods to a problem in the economics **of** education. We extend

the Willis–Rosen model to explicitly account for dependence in potential

outcomes across potential schooling states, to account for psychic benefits in

the return to schooling **an**d to measure the effect **of** uncertainty on schooling

choices. We extend the framework **of** Flavin (1981) **an**d H**an**sen et al. (1991), who

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 407

estimate the impact **of** uncertainty on consumption choices to a discrete choice

setting to estimate agent information sets. Our framework extends the inequality

decomposition **an**alysis **of** DiNardo et al. (1996) to account for self-selection in

the choice **of** sectors.

Our **an**alysis reveals subst**an**tial heterogeneity in the returns to schooling,

much **of** which is unpredictable at the time schooling decisions are made. We

also find a subst**an**tial nonpecuniary return to college. Although there is subst**an**tial

uncertainty in forecasting returns at the time schooling decisions are

made, eliminating it has modest effects on schooling choices. Uncertainty is inherent

in both college **an**d high school outcomes at the time schooling decisions

are made. In addition, nonpecuniary factors play a domin**an**t role in schooling

choices. The assumption **of** perfect r**an**king **of** potential outcome across alternative

choices is soundly rejected, although potential outcomes are strongly positively

correlated.

We simulate a tuition reduction policy to determine who benefits or loses from

it. We go beyond the Veil **of** Ignor**an**ce to determine which persons are affected

by the policy. The policy favors those at the top **of** the income distribution. This

simulation illustrates the power **of** our method to lift the Veil **of** Ignor**an**ce, **an**d

to count the losers **an**d gainers from **an**y policy initiative.

APPENDIX A: PROOFS OF THEOREMS 38

PROOF OF THEOREM 1. The case where M consists **of** purely continuous components

is trivial. We observe M c for each X **an**d c**an** recover the marginal distribution

for each component. Recall that M is not state dependent.

For the purely discrete case, we encounter the usual problem that there is no

direct observable counterpart for µ d m (X). Under (A-1)–(A-5), we c**an** use the

**an**alysis **of** M**an**ski (1988) to identify the slope coefficients βl,m d up to scale, **an**d the

marginal distribution **of** Ul,m d . From the assumption that the me**an** (or medi**an**) **of**

Ul,m d is zero, we c**an** identify the intercept in βd l,m

. We c**an** repeat this for all discrete

components. Therefore, coordinate by coordinate we c**an** identify the marginal

distributions **of** Um c , Ũd m ,µc m (X), **an**d ˜µd m (X), the latter up to scale (“∼” me**an**s

identified up to scale).

To recover the joint distribution write

(

Pr (M c ≤ m c , M d = (0,...,0) | X) = F U c m ,Ũm

d mc − µ c m (X) , −˜µd m (X))

by Assumption (A-2). To identify F U c m ,Ũ (t 1, t

m d 2 ) for **an**y given evaluation points in

the support **of** (Um c , Ũd m ), we know the function ˜µd m (X) **an**d using (A-3) we c**an**

find **an** X where ˜µ d m (X) = t 2. Let ̂x denote this value, so ˜µ d m (̂x) = t 2. In this pro**of**,

t 1 , t 2 may be vectors. Thus,

Pr (M c ≤ m c , M d = (0,...,0) | X = ̂x) = F U c m ,Ũ d m (m c − µ c m (̂x) , t 2)

38 We th**an**k Edward Vytlacil for simplifying **an**d clarifying the statements **an**d pro**of**s **of** all three

theorems in this section.

408 CARNEIRO, HANSEN, AND HECKMAN

Let ̂m c = t 1 − µ c m (̂x) to obtain

Pr (M c ≤ ̂m c , M d = (0,...,0) | X = ̂x) = F U c m ,Ũ d m (t 1, t 2 )

We know the left-h**an**d side **an**d thus identify F U c m ,Ũ at the evaluation point t 1, t

m d 2 .

Since (t 1 , t 2 ) is **an**y arbitrary evaluation point in the support **of** Um c , Ũd m we c**an** thus

identify the full joint distribution.

PROOF OF THEOREM 2.

(

c1 (Q 1 ) − ϕ (Z)

Pr (D 1 = 1 | Z, Q 1 ) = Pr

> ε )

W

σ W σ W

Under (A-1), (A-2), (A-6), (A-7), **an**d (A-9), it follows that c 1(Q 1 ) − ϕ(Z)

σ W

**an**d F˜εW

(where ˜ε W = ε W

σW

) are identified (see M**an**ski, 1988; Matzkin, 1992, 1993). Under

r**an**k condition (A-7), identification **of** c 1(Q 1 ) − ϕ(Z )

σ W

implies identification **of** c 1(Q 1 )

σ W

**an**d ϕ(Z )

σ W

separately. Write

( ) ( )

c2 (Q 2 ) − ϕ (Z) c1 (Q 1 ) − ϕ (Z)

Pr (D 2 = 1 | Z, Q 1, Q 2 ) = F˜εW − F˜εW

σ W σ W

From the absolute continuity **of**˜ε W **an**d the assumption that the distribution function

**of**˜ε W is strictly increasing, we c**an** write

c 2 (Q 2 )

σ W

[

( )]

= F −1

c1 (Q 1 ) − ϕ (Z)

˜ε W

Pr (D 2 = 1 | Z, Q 1 , Q 2 ) + F˜εW + ϕ (Z)

σ W σ W

Thus, we c**an** identify c 2(Q 2)

σ W

over its support **an**d, proceeding sequentially, we c**an**

identify c s(Q s)

σ W

, s = 3,...,S. Under (A-8) we c**an** identify η s , s = 2,...,S.

Observe that we could use the final choice (Pr(s = S)) rather th**an** the initial

choice to start **of**f the pro**of** **of** identification using **an** obvious ch**an**ge in the assumptions.

PROOF OF THEOREM 3. From (A-2), the unobservables are jointly independent **of**

(X, Z, Q). For fixed values **of** (Z, Q s , Q s−1 ), we may vary the points **of** evaluation

for the continuous coordinates (ys c ) **an**d pick alternative values **of** X = ̂x to trace

out the vector µ c (X) up to intercept terms. Thus, we c**an** identify µ c s,l

(X) upto

a const**an**t for all l = 1,...,N c,s (Heckm**an** **an**d Honoré, 1990). Under (A-2), we

recover the same functions for whatever values **of** Z, Q s , Q s−1 are prespecified

as long as c s (Q s ) > c s−1 (Q s−1 ), so that there is **an** interval **of** ε W bounded above

**an**d below **with** positive probability. This identification result does not require **an**y

passage to a limit argument.

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 409

For values **of** (Z, Q s , Q s−1 ) such that

lim Pr (D

Qs → ¯Qs (Z ) s = 1 | Z, Q s , Q s−1 ) = 1

Q s−1 →Q s−1

(Z )

where ¯Q s (Z) is **an** upper limit **an**d Q s−1 (Z) is a lower limit, **an**d we allow the

limits to depend on Z, we essentially integrate out˜ε W **an**d obtain

Pr ( M c ≤ m c , ˜µ d m ≤−Ud m , Uc s ≤ yc s − µc (X), Ũ d s

≤−˜µ d s (X))

We know that this probability c**an** be achieved by virtue **of** the support condition

**of** Assumption (A-10).

Then proceeding as in the pro**of** **of** Theorem 1, we c**an** identify ˜µ d s (X) coordinate

by coordinate **an**d we obtain the const**an**ts in µ c s,l (X), l = 1,...,N c,s as well as

the const**an**ts in ˜µ d (X). From the assumption **of** me**an** or medi**an** zero **of** the

unobservables. In this exercise, we use the full r**an**k condition on X, which is part

**of** Assumption (A-11).

With these functions in h**an**d, under the full conditions **of** Assumption (A-10),

we c**an** fix ys c, yc m , ˜µd s , ˜µd m , c s(Q s) − ϕ(Z)

σ W

, c s−1(Q s−1) − ϕ(Z)

σ W

at different values to trace out

the joint distribution F(Um c , Ũd m , Uc s , Ũd s ,˜ε W). 39

APPENDIX B: DESCRIPTION OF THE DATA

We use white males from NLSY79. In the original sample there are 2439 individuals.

We consider the information on these individuals from ages 19 to 35. We

discard 663 individuals because they have observations missing for at least one **of**

the covariate variables we use in the **an**alysis. Table 2a,b contains a description **of**

the number **of** missing observations per variable. For example, we discard 50 individuals

because we do not observe whether they were living in the South when they

were 14 years old or not. Then we discard **an**other 6 for not having information on

whether they lived in **an** urb**an** area at age 14, **an**other 5 for not reporting the number

**of** siblings, 221 for not indicating parental education **an**d so on, as described

in Table 2. We then restrict the NLSY sample to white males **with** a high school or

college degree. We define high school graduates as individuals having a high school

degree or having completed 12 grades **an**d never reporting college attend**an**ce. We

define participation in college as having a college degree or having completed more

39 Using a st**an**dard definition **of** identification, a model (F U ,β) is identified iff for **an**y alternative

parameters (F ∗ U ,β∗ ) ≠ (F U ,β), there exists some ε>0 such that

Pr(|F U (β) − F ∗ U (β∗ )| >ε) > 0

where the probability is computed **with** respect to the density **of** the data-generating process.

Our use **of** limit set arguments may appear to contradict the st**an**dard definition **of** identification

because **of** zero probability in the limit sets. However, this intuition is false. See the argument in

Aakvik et al. (1999), Theorem 1, which justifies the appeal to limit arguments used in this article in

terms **of** st**an**dard definitions **of** identification.

410 CARNEIRO, HANSEN, AND HECKMAN

th**an** 16 years in school. We exclude the oversample **of** poor whites. Experience is

Mincer experience (age-12 if high-school graduate, age-16 for college graduate).

The variables that we include in the outcome **an**d choice equations are number **of**

siblings, parental years **of** schooling, AFQT, year **of** birth dummies, average tuition

**of** the colleges in the county the individual lives in at 17 (we simulate the policy

ch**an**ge by decreasing this variable by $1000 for each individual), dist**an**ce to the

nearest college at 17, average local blue collar wage in state **of** residence at 17 (or

in 1979, for individuals entering the sample at ages older th**an** 17) **an**d local unemployment

rate in the county **of** residence in 1979. For the construction **of** the tuition

variable see Cameron **an**d Heckm**an** (2001). Dist**an**ce to college is constructed by

matching college location data in HEGIS (Higher Education General Information

Survey) **with** county **of** residence in NLSY. State average blue collar wages

are constructed by using data from the BLS. For a description **of** the NLSY sample

see BLS (2001).

In 1980, NLSY respondents were administered a battery **of** 10 achievement

tests referred to as the Armed Forces Vocational Aptitude Battery (ASVAB; see

Cawley et al., 1997, for a complete description). The math **an**d verbal components

**of** the ASVAB c**an** be aggregated into the Armed Forces Qualification Test

(AFQT) scores. 40 M**an**y studies have used the overall AFQT score as a control

variable, arguing that this is a measure **of** scholastic ability. We argue that AFQT

is **an** imperfect proxy for scholastic ability **an**d use the factor structure to capture

this. We also avoid a potential aggregation bias by using each **of** the components

**of** the ASVAB as a separate measure.

For our **an**alysis, we use the r**an**dom sample **of** the NLSY **an**d restrict the sample

to 1161 white males for whom we have information on schooling, several parental

background variables, test scores, **an**d behavior. Dist**an**ce to nearest college at

each date is constructed in the following way: Take the county **of** residence **of**

each individual **an**d all other counties **with**in the same state. The dist**an**ce between

two counties is defined as the dist**an**ce between the center **of** each county. If

there exists a college (2-year or 4-year) in the county **of** residence where a person

lives then the dist**an**ce to the nearest college (2-year or 4-year) variable takes the

value **of** zero. Otherwise, we compute dist**an**ce (in miles) to the nearest county

**with** a college. Then we construct the dist**an**ce to the nearest college at 17 by

using the county **of** residence at 17. However, for people who were older th**an**

17 in 1979 we use the county **of** residence in 1979 for the construction **of** this

variable.

Tuition at age 17 is average tuition in colleges in the county **of** residence at

17. If there is no college in the county then average tuition in the state is taken

instead. For details on the construction **of** this variable see Cameron **an**d Heckm**an**

(2001).

Local labor market variables for the county **of** residence are computed using

information in the 5% sample **of** the 1980 Census. For each county group in the

census we compute the local unemployment rate **an**d average wage for high school

40 Implemented in 1950, the AFQT score is used by the army to screen draftees.

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 411

dropouts, high school graduates, individuals **with** some college, **an**d four-year college

graduates. We do not have this variable for years other th**an** 1980 so, for each

county, we assume that it is a good proxy for local labor market conditions in all

the other years where NLSY respondents are assumed to be making the schooling

decisions we consider in this article.

We also use the variable log **an**nual labor earnings. We extract this variable

from the NLSY79 reported **an**nual earnings from wages **an**d salary. Earnings (in

thous**an**ds **of** dollars) are discounted to 1993 using the Consumer Price Index

reported by the Bureau **of** Labor Statistics. Missing values for this variable may

occur here for two reasons: first, because respondents do not report earnings from

wages/salary, **an**d second, because the NLSY becomes bi**an**nual after 1994 **an**d

this prevents us from observing respondents when they reach certain ages. For

example, because the NLSY79 was not conducted in 1995, we do not observe

individuals born in 1964 when they are 31-year-olds. In this case we input missing

values.

To predict missing log earnings between ages 19 **an**d 35 **an**d extrapolate from age

36 to 65 years we pool NLSY **an**d PSID data. From the latter, we use the sample

**of** white males that are household heads **an**d that are either high school or college

graduates according to the definition given above. This produces a sample **of** 3043

individuals from the PSID. To get **an**nual earnings, we multiply the reported CPIadjusted

(1993 = 100) hourly wage rate by the **an**nual hours worked **an**d divide

the outcome by 1000. Then we take logs to have **an** NLSY-comparable variable.

Similarly to NLSY, we generate the Minceri**an** experience according to the rule

given above. We also generate dummy variables for cohorts. The first (omitted)

cohort, consists **of** individuals born between 1896 **an**d 1905, the second consists **of**

individuals born between 1906 **an**d 1915, **an**d so on up to the last cohort, which is

made up **of** PSID respondents born between 1976 **an**d 1985. We pool NLSY **an**d

PSID by merging the NLSY respondents in the PSID cohort born between 1956

**an**d 1965.

Let Y ia denote log earnings **of** agent i at age a. For each schooling choice s, we

model the earnings-experience pr**of**ile as

Y ia (s) = α + β 0 X ia + β 1 X 2

ia + Dγ + ε ia

ε ia = η i + v ia

(B.1)

v ia = ρv ia−1 + κ ia

where X ia is Mincer experience, D is a set **of** dummy variables that indicate cohort,

η i is the individual effect, **an**d κ ia is white noise. In Table A14 posted at

http://lily.src.uchicago.edu/CHH estimating.html we report the OLS estimates for

α, β 0 ,β 1 ,γ,ρ based on the pooled data set.

412 CARNEIRO, HANSEN, AND HECKMAN

Now, let ˆε ia be the estimated residual **of** the earnings–experience pr**of**ile. An

estimator **of** the individual effect η i is

ˆη i =

65∑

1

∑ 65

a=19 φ φ iaˆε ia ,

ia a=19

where φ ia = 1 (if individual i is observed at age a)

Then, we c**an** obtain **an** estimator **of** v ia by computing

ˆv ia = ˆε ia − ˆη i

Now, given ̂v ia we c**an** run Equation (B.1) **an**d then compute ρ. From this we

obtain **an** estimator **of** κ ia according to

ˆκ ia = ˆv ia − ˆρ ˆv ia−1

We c**an** then predict earnings for missing observations for ages 19 to 35

**an**d perform the extrapolation from ages 36 to 65 by computing for each

individual

Ŷ ia (s) = ˆα + ˆβ 0 X ia + ˆβ 1 X 2 ia + Dˆγ + ˆε ia

= ˆα + ˆβ 0 X ia + ˆβ 1 X 2 ia + Dˆγ + ˆη i + ˆρ ˆv ia−1 + ˆκ ia

Note that to get ˆε ia we do not set ˆκ ia equal to zero. Instead, we sample 10

draws from its distribution **an**d average them for each individual, for each time

period.

The next step is to get the present value **of** log earnings at age 19 for each

agent. In order to do it we discount log earnings at each period using a discount

rate **of** 3%. For identification purposes we then break each individual’s working

life in two periods. The first one goes from age 19 to 29. The second period goes

from age 30 all the way to 65. This produces a p**an**el in which the first observation

for each agent is the present value **of** log earnings from age 19 to 29 **an**d

the second is the present value **of** log earnings from age 30 to 65. This me**an**s

that lifetime present value **of** log earnings is just the sum **of** these two components.

Table 2b contains descriptive statistics for the present value **of** log earnings

for the entire working life period **an**d also for the two subperiods used in the

**an**alysis.

APPENDIX C: MARKOV CHAIN MONTE CARLO SIMULATION METHODS

Due to the complex nature **of** the likelihood function we rely on Markov Chain

Monte Carlo techniques to estimate the model. These are computer-intensive

algorithms based on designing **an** ergodic discrete-time continuous-state Markov

chain **with** a tr**an**sition kernel having **an** invari**an**t measure equal to the posterior

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 413

distribution **of** the parameter vector ψ; see Robert **an**d Casella (1999) for details.

In particular, we will be using the Gibbs sampling algorithm. 41

We first describe how the Gibbs sampler c**an** be used to estimate models in

the general setup laid out in Section 4. Let ψ s,a be parameters specific to the

distribution **of** outcomes **with** schooling level s at age a, let ψ m be parameters

specific to the distribution **of** measurements, let ψ c be parameters specific tothe

distribution **of** schooling choice, **an**d let ψ θ be parameters specific to the factor

distributions. Let n be the number **of** observations. Let the outcome matrix over

all ages **with** schooling level s be Y s,i = (Ys,i c , Y∗d s,i

) **an**d the vector **of** measurements

is M.

The complete data likelihood for completed schooling level S = s is

f (M, Y s , I,θ| ψ) =

∏

i:D i,s =1

f (M i , Y s,i , I i ,θ i | ψ)

where ψ = [ψ s,a ,ψ m ,ψ c ,ψ θ ], “i” denotes a subscript for individual i **an**d

f (M i , Y s,i , I i ,θ i | ψ) = f (M i | ψ m ,θ i ) ×

The complete data posterior is

f (M, Y, I,θ,ψ| data) ∝

Ā∏

f (Y s,a,i | ψ s,a ,θ i ) f (I i | θ i ,ψ c ) f (θ i | ψ)

a=1

¯S∏

s=1

f (θ, M, Ys ∗ , I | ψ) f (ψ)

where Y = (Y 1 ,...,Y¯S).

In what follows the conditional posteriors that constitute the tr**an**sition kernel

**of** the Gibbs sampler will be derived.

Choice Equations.

Conditional on the factors we have

(C.1)

f ( η, γ, ρ | ψ −(η,γ,ρ) ,θ )

{ }

n∏ ∣

∝ f (I i Z i ′ η + γ ′ θ i , 1)

i=1

×

{ ¯s∑

j=1

1(c i, j−1 < I i < c i, j )D i, j }1(c i1 < ···< c i ¯s ) f (η, γ ) f (ρ)

}

41 For other uses **of** Markov Chain Monte Carlo techniques in models **an**d applications related

to ours, see Chib **an**d Hamilton (2000), who implement MCMC methods for a p**an**el version **of** a

generalized Roy model, **an**d Chib **an**d Hamilton (2002), who consider various cross-sectional treatment

models.

414 CARNEIRO, HANSEN, AND HECKMAN

This marginal c**an** be factored into two conditionals. Conditional on ρ we have

f ( η, γ | ρ,ψ −(η,γ,ρ) ,θ ) ∝

n∏

i=1

f (I i | Z ′

i η + γ ′ θ i , 1) f (η, γ )

This is the posterior for a normal regression model **with** covariates Z i ,θ i **an**d

precision fixed at 1. With f (η, γ ) multivariate normal this is a multivariate normal

distribution.

The second conditional (for ρ)is

f ( ρ | η, γ, ψ −(η,γ,ρ) ,θ ) ∝

n∏

¯s∑

i=1 j=1

1(c i, j−1 < I i < c i, j )D i, j 1(c i1

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 415

This factors into n independent truncated normals,

f ( I | ψ −(η,γ,ρ) ,θ ) =

n∏

i=1

TN (ci, j−1 ,c ij )(I i | Z ′

i η + γ ′ θ i , 1)

So, we sample I i , i = 1,...,N, one at a time from truncated normals.

Measurement Equations.

form

The continuous measurement equations are **of** the

(C.3)

M i, j = X m,i, ′ j βc m, j + αc ′ m, j θ i + εm,i, c j

Given X m,i, j ,θ i , this is a linear regression model. With multivariate normal priors

on (βm, c j ,αc m, j ) **an**d a gamma prior on the precision **of** εc m,i, j

, this is in the form

**of** the st**an**dard conjugate Bayesi**an** linear regression model, **with** a conditional

normal distribution for βm, c j given the precision **of** εc m,i, j

**an**d a gamma distribution

for the precision conditional on βm, c j .

Let the last m − m 1 elements **of** the measurement vector M be binary indices

generated as

M d j = 1 ( M ∗d

j ≥ 0 ) , j = m 1 + 1,...,m

The parameters in the binary measurements are samples as above **with** two exceptions.

First, a separate step samples the latent measurements, M ∗d

j

,as

M ∗d

i, j ∼

{ ( ∣

TN(0,∞) M

∗d

i, j

TN (−∞,0)

(

M

∗d

i, j

∣ X

′

m,i, j

βm, d j + αd ′

m, j θi , 1 ) if Mi, d j

= 1

∣ X

′

m,i, j

βm, d j + αd ′

m, j θi , 1 ) if Mi, d j

= 0

Second, the precision is not sampled but fixed at one.

Outcome Equations. Let Y s,a be the outcome vector at age a **with** schooling

level s. Suppose both employment **an**d wage outcomes are modeled. Let Ys,a c be

the wage outcome **an**d Ys,a d the employment outcome. Also, let Y∗,d s,a be the latent

employment index. By the factor structure assumption we have

f ( Y c

for a person working.

The model for wages is

s,a , Y∗,d s,a

∣ θ ) = f ( ∣

Y c

s,a

∣ θ ) f ( Y ∗,d

s,a

Ys,a,i c = X 1,a,i ′ βc s,a + αc ′ s,a θ i + εa,s,i

c

∣ θ )

where εa,s,i c ∼ N(0,τs,a c ). This is in the form **of** a st**an**dard linear regression model

under normality **an**d (βs,a c ,αc s,a ,τc s,a ) is sampled as above (using multivariate normal

**an**d gamma priors).

416 CARNEIRO, HANSEN, AND HECKMAN

We c**an** allow for general state dependence by modeling the latent employment

tr**an**sition indices as

Y d,∗

s,a,i

=

{ X

′

2,a,s,i

βa,s,0 d + αd ′

a,s,0 θi + εa,s,i,0 d , if Yd

s,a−1,i

= 0

X

2,a,s,i ′ βd a,s,1 + αd ′

a,s,1 θi + εa,s,i,1 d , if Yd

s,a−1,i

= 1

where εa,s,i,0 d **an**d εd a,s,i,1

are both st**an**dard normal.

The conditional **of** (β 2,a,s,0 ,α 2,a,s,0 ) **an**d (β 2,a,s,1 ,α 2,a,s,1 )is

f ( βa,s,0 d )

,αd ∣

a,s,0 ψ−β d

a,s,0 ,αa,s,0

d

∝ f ( βa,s,0 d ) ∏

,αd a,s,0 f ( Y d,∗

∣

s,a,i X

′

2,a,s,i βa,s,0 d + αd ′

a,s,0 θi , 1 )

i:Ys,a−1,i d =0

f ( βa,s,1 d )

,αd ∣

a,s,1 ψ−β d

a,s,1 ,αa,s,1

d

∝ f ( βa,s,1 d ) ∏

,αd a,s,1 f ( Y d,∗

s,a,i | X′ 2,a,s,i βd a,s,1 + αd ′

a,s,1 θi , 1 )

i:Ys,a−1,i d =1

Both **of** these are normal regression models **with** the precision fixed at 1. The

latent employment indices are sampled as in the usual binary choice framework

(see Albert **an**d Chib, 1993).

Factors. The conditional for θ factors into n conditionals for θ 1 ,...,θ n . To see

what the conditional for θ i is note that all contributions **of** θ i originate from linear

regression models,

I i − Z

i ′ η = γ ′ θ i + ε I,i ,

(choice model)

M j − X

m,i, ′ j β m, j = α

m, ′ j θ i + ε m, j , (measurements)

Ys,a,i c − X′ 1,a,i βc s,a = α c s,a ′ θ i + εa,s,i c , (wages)

Y d,∗

2,s,a,i − X′ 2,a,s,i βd a,s,l = α d ′ a,s,l θ i + εa,s,i,l d , (employment)

This equation system is **of** the form

Ŷ i = A i θ i + u i

where u i ∼ N(0, i ), where i is a diagonal precision matrix. The conditional

posterior for θ i is then

{

f (θ i | ψ) ∝ exp − 1 }

2 (Ŷ i − A i θ i ) ′ i (Ŷ i − A i θ i ) f (θ i )

where

f (θ i ) =

K∏

J K ∑

k=1 j=1

p k, j N ( θ ik | µ k, j ,τ k, j

)

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 417

We sample θ ik |{θ ij } j≠k one at a time from their respective conditionals, which c**an**

be shown to be a mixture **of** normals **with** **an** updated (data-dependent) mixture

**of** weights **an**d parameters.

Conditional on the factor vector θ, we have

θ ik ∼

∑J k

j=1

p l, j N(θ il | µ l, j ,τ l, j ),

i = 1,...,n

Conditional on θ we c**an** treat the factors as known **an**d update the mixture

parameters (p k ,µ k ,τ k ). We follow the “group indicator” approach in Diebolt **an**d

Robert (1994) **an**d augment the parameter vector by a sequence **of** latent group

indicators defined as g i = j if a θ i, j originates from a mixture component j. Conditional

on the mixture group indicators the mixture parameters are easily sampled

**an**d conditional on the mixture parameters the group indicators are simple multinomials.

To preserve the identification **of** intercepts we constrain the mixture to

have me**an** zero using the method proposed in Richardson et al. (2000).

The estimation **of** the structural models in Section 7 is done as above **with** a

few modifications. The choice model is a probit so the cut point is c = 0, **an**d

no ρ parameters are estimated. The cross-equation restrictions are imposed as

follows: Let Ỹ i = (V 1,h,i , V 2,h,i , V 1,c,i , V 2,c,i , V i ), i.e., the stacked outcomes under

high school **an**d college **an**d the choice index. We c**an** then write the model as

Ỹ i = W i ψ + Ɣθ i + ε i

M i = X i ω + α test θ i + ε test

where ψ ={{¯δ 1,s , ¯δ 2,s , ¯β 1,s , ¯β 2,s } s ,δ P ,γ}, **an**d W i **an**d the loading matrix Ɣ =

Ɣ({ᾱ 1,s , ᾱ 2,s } s ,α P ) are defined appropriately. This model is now in the form **of**

the system described above **an**d the required conditionals are derived as above.

APPENDIX D: MEAN PRESERVING SPREAD

For the model described in Section 7, assume that ε a,s are independent **an**d

identically normally distributed **with**in each period,

ε a,s ∼ N ( 0,σ 2 s,1)

for ages 19–29

ε a,s ∼ N ( 0,σ 2 s,2)

for ages 30–65.

Then,

(

ε 1,s ∼ N 0,

(

ε 2,s ∼ N 0,

∑29

a=19

∑65

a=30

)

σs,1

2

(1 + ρ) a

)

σs,2

2

(1 + ρ) a

418 CARNEIRO, HANSEN, AND HECKMAN

At each age

where

ln Y a,s = δ a,s + X ′ β a,s + α ′ a,s θ + ε a,s + η 1,s × experience a

+ η 2,s × experience 2 a = µ a,s + ε a,s

then

µ a,s = δ a,s + X ′ β a,s + α ′ a,s θ + η 1,s × experience a + η 2,s × experience 2 a

E(Y a,s | X,θ) = exp(µ a,s )E[exp(ε a,s )].

We do a me**an** preserving spread at each age a by giving the individual knowledge

**of** ε a,s :

Then,

E (Y a,s | X,θ,ε a,s ) = exp (µ a,s + ε a,s ) = exp(µ ′ a,s )

exp(µ ′ a,s ) = exp(µ a,s)E[exp(ε a,s )]

Since the ε a,s are i.i.d. we c**an** drop the age subscript on the ε:

exp(µ ′ a,s ) = exp (µ a,s) E (exp (ε s ))

The me**an** preserving spread is actually a combination **of** age-by-age me**an** preserving

spreads. Finally, compute

Define

µ 1,s =

µ 2,s =

29∑

a=19

65∑

a=30

µ ′ 1,s = 29∑

a=19

µ ′ 2,s = 65∑

a=30

µ a,s

(1 + ρ) a

µ a,s

(1 + ρ) a

µ ′ a,s

(1 + ρ) a

µ ′ a,s

(1 + ρ) a

V = µ 1,C + µ 2,C − µ 1,a − µ 2,a − Zγ − α ′ p θ − ε p

V ′ = µ ′ 1,C + µ′ 2,C − µ ′ 1,a − µ ′ 2,a + Zγ − α ′ p θ − ε p + ¯ε 1,C + ¯ε 2,C − ¯ε 1,a − ¯ε 2,a

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 419

The probability **of** going to college is given by

for the first case **an**d for the second case

Pr (V > 0)

Pr (V ′ > 0)

The experiment for the case where we remove θ 1 from the information set **of** the

agent, keeping age-by-age me**an** earnings const**an**t, is **an**alogous to the one just

described.

REFERENCES

AAKVIK, A., J. HECKMAN, AND E. VYTLACIL, “Training Effects on Employment when the

Training Effects are Heterogeneous: An **Application** to Norwegi**an** Vocational Rehabilitation

Programs,” M**an**uscript, University **of** Chicago, 1999.

——, ——, AND ——, “Treatment Effects For Discrete Outcomes when Responses To Treatment

Vary Among Observationally Identical Persons: An **Application** to Norwegi**an**

Vocational Rehabilitation Programs,” NBER Working Paper No. TO262, Journal **of**

Econometrics, 2003, forthcoming.

ALBERT,J.,AND S. CHIB, “Bayesi**an** Analysis **of** Binary **an**d Polychotomous Response Data,”

Journal **of** the Americ**an** Statistical Association 88 (1993), 669–79.

ANDERSON,T.W.,AND H. RUBIN, “Statistical Inference in Factor Analysis,” in J. Neym**an**, ed.,

Proceedings **of** Third Berkeley Symposium on Mathematical Statistics **an**d Probability,

5 (Berkeley: University **of** California Press, 1956), 111–50.

ATHEY, S., AND G. IMBENS, “Identification **an**d Inference in Nonlinear Difference-In-

Differences Models,” NBER Technical Working Paper T0280, 2002.

BEN AKIVA, M., BOLDUC, D.,AND WALKER, J.“Specification, Identification **an**d Estimation

**of** the Logit Kernel (or Continuous Mixed Logit Model),” M**an**uscript, Department

**of** Civil Engineering, MIT, February 2001.

BUERA, F.J.,“Testable Implications **an**d Identification **of** Occupational Choice Models,”

unpublished m**an**uscript, University **of** Chicago, 2002.

Bureau **of** Labor Statistics, NLS H**an**dbook 2001 (Washington, D.C.: U.S. Department **of**

Labor, 2001).

CAMERON, S.,AND J. HECKMAN, “Son **of** CTM: The DCPA Approach Based on Discrete

Factor Structure Models,” Unpublished m**an**uscript, University **of** Chicago, 1987.

——, AND ——, “Life Cycle Schooling **an**d Dynamic Selection Bias,” Journal **of** Political

Economy 106 (1998), 262–333.

——, AND ——, “The Dynamics **of** Educational Attainment for Blacks, Whites **an**d Hisp**an**ics,”

Journal **of** Political Economy 109 (2001), 455–99.

CARNEIRO, P., K. HANSEN, AND J. HECKMAN, “Removing the Veil **of** Ignor**an**ce in Assessing

the Distributional Impacts **of** Social Policies,” Swedish Economic Policy Review, 8,

(2001), 273–301.

CAWLEY, J., K. CONNEELY, J.HECKMAN, AND E. VYTLACIL, “Cognitive Ability, Wages, **an**d

Meritocracy,” in B. Devlin, S. E. Feinberg, D. Resnick, **an**d K. Roeder, eds., Intelligence

Genes, **an**d Success: Scientists Respond to the Bell Curve (New York: Springer-Verlag,

1997), 179–92.

CHAMBERLAIN, G.,“Education, Income, **an**d Ability Revisited,” Journal **of** Econometrics

1977a), 241–57

——, “An Instrumental Variable Interpretation **of** Identification in Vari**an**ce Components

**an**d MIMIC Models,” in Paul Taubm**an**, ed., Kinometrics: Determin**an**ts **of** Socio-

Economic Success Within **an**d Between Families (Amsterdam: North-Holl**an**d, (1977b).

420 CARNEIRO, HANSEN, AND HECKMAN

——, AND Z. GRILICHES, “Unobservables **with** a Vari**an**ce-Components Structure: Ability,

Schooling, **an**d the Economic Success **of** Brothers,” International Economic Review

16 (1975), 422–49.

CHIB,S.,AND B. HAMILTON, “Bayesi**an** Analysis **of** Cross Section **an**d Clustered Data Treatment

Models” Journal **of** Econometrics 97 (2000), 25–50.

——, AND ——, “Semiparametric Bayes Analysis **of** Longitudinal Data Treatment Models,”

Journal **of** Econometrics, 110 (2002), 67–89.

COCHRANE, W.G.,AND D. RUBIN, “Controlling Bias in Observational Studies: A Review,”

S**an**khya A 35 (1973), 417–46.

COSSLETT, S. R., “Distribution-Free Maximum Likelihood Estimator **of** the Binary Choice

Model,” Econometrica, 51 (1983), 765–82.

DIEBOLT, J., AND C. P. ROBERT, “Estimation **of** Finite Mixture **Distributions** Through

Bayesi**an** Sampling,” Journal **of** the Royal Statistical Society, Series B, 56 (1994), 363–75.

DINARDO, J., N. M. FORTIN, AND T. LEMIEUX, “Labor Market Institutions **an**d the Distribution

**of** Wages, 1973–1992: A Semiparametric Approach,” Econometrica, 64 (1996): 1001–

44.

ECKSTEIN, Z., AND K. WOLPIN, “The Specification **an**d Estimation **of** Dynamic Stochastic

Discrete Choice Models: A Survey,” Journal **of** Hum**an** Resources, 24 (1989), 562–98.

——, AND ——, “Dynamic Labour Force Participation **of** Married Women **an**d Endogenous

Work Experience,” Review Economic Studies 56 (1999), 375–90.

ELROD, T.AND M. KEANE, “A Factor-**an**alytic Probit Model for Representing the Market

Structure in P**an**el Data,” Journal **of** Marketing Research 32 (1995), 1–16.

FERGUSON, T.S.,“Bayesi**an** Density Estimation by Mixtures **of** Normal **Distributions**,” in

M. Rizvi, J. Rustagi, **an**d D. Siegmund, eds., Recent Adv**an**ces in Statistics (New York:

Academic Press, 1983) 287–302.

FLAVIN, M., “The Adjustment **of** Consumption to Ch**an**ging Expectations about Future

Income,” Journal **of** Political Economy 89 (1981), 974–1009.

FLORENS, J., M. MOUCHART, AND J. ROLIN. Elements **of** Bayesi**an** Statistics (New York:

M. Dekker, 1990).

FRÉCHET, M., “Sur les tableaux de corrélation dont les marges sont donneés,” Annals

Université Lyon, Sect. A, Series 3, 14(1951), 53–77.

GEWEKE, J., D. HOUSER, AND M. KEANE, “Simulation Based inference for Dynamic Multinomial

Choice Models,” in B. H. Baltaji, ed., Comp**an**ion for Theoretical Econometrics

(London: Basil Blackwell, 2001).

GOLDBERGER,A.S.,“Structural Equation Methods in the Social Sciences.” Econometrica,

40 (1972), 979–1001.

HANSEN, K., J. HECKMAN, AND K. MULLEN, “The Effect **of** Schooling **an**d Ability on Achievement

Test Scores,” Journal **of** Econometrics, 2003, forthcoming.

——, ——, AND S. NAVARRO, “Nonparametric Identification **of** Time to Treatment Models

**an**d The Joint **Distributions** **of** **Counterfactuals**,” Unpublished M**an**uscript, University

**of** Chicago, 2003.

HANSEN, L., W. ROBERDS, AND T. SARGENT, “Time Series Implications **of** Present Value

Budget Bal**an**ce **an**d **of** Martingale Models **of** Consumption **an**d Taxes,” in L. H**an**sen

**an**d T. Sargent, eds., Rational Expectations Econometrics (Boulder, CO: Westview

Press, 1991).

HECKMAN,J.,“Statistical Models for Discrete P**an**el Data,” in C. M**an**ski **an**d D. McFadden,

eds., Structural Analysis **of** Discrete Data With Econometric **Application**s (Cambridge,

MA: MIT Press, 1981).

——, “Varieties **of** Selection Bias.” Americ**an** Economic Review 80 (1990), 313–18.

——, “R**an**domization **an**d Social Policy Evaluation,” in C. F. M**an**ski **an**d Irwin Garfinkel,

eds. Evaluating Welfare **an**d Training Programs, (Cambridge, MA: Harvard University

Press, 1992).

——, “Micro Data, Heterogeneity, **an**d the Evaluation **of** Public Policy: Nobel Lecture,”

Journal **of** Political Economy. 109 (2001), 673–748.

EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 421

——, AND B. HONORÉ, “The Empirical Content **of** the Roy Model,” Econometrica, 58 (1990),

1121–49.

——, AND S. NAVARRO, “Ordered Discrete Choice Models **with** Stochastic Shocks,”

M**an**uscript, University **of** Chicago, 2001.

——, AND ——, “Using Matching, Instrumental Variables **an**d Control Functions to Estimate

Economic Choice Models,” Review **of** Economics **an**d Statistics (2003), forthcoming.

——, AND R. ROBB. “Alternative Methods for Evaluating the Impact **of** Interventions,” in

J. Heckm**an** **an**d B. Singer, eds., Longitudinal Analysis **of** Labor Market Data. (New

York: Cambridge University Press, 1985).

——, AND ——, (1986). “Alternative Methods for Solving the Problem **of** Selection Bias

in Evaluating the Impact **of** Treatments on Outcomes,” in Drawing Inferences from

Self-Selected Samples, H. Wainer, ed. (New York: Springer-Verlag, 1986; reprinted in

2000 by Lawrence Erlbaum Associates).

——, AND J. SMITH, “Assessing the Case for R**an**domized Evaluation **of** Social Programs.”

in K. Jensen **an**d P. K. Madsen, eds., Measuring Labour Market Measures: Evaluating

The Effects **of** Active Labour Market Policy Initiatives, (Copenhagen: Ministry Labour,

1993).

——, AND ——, “Evaluating the Welfare State,” in Econometrics **an**d Economic Theory

in the 20th Century: The Ragnar Frisch Centennial, Econometric Society Monograph

Series, S. Strom, ed., (Cambridge: Cambridge University Press, 1998).

——,R.LALONDE, AND J. SMITH, “The Economics **an**d Econometrics **of** Active Labor Market

Programs,” In O. Ashenfelter **an**d D. Card, eds, H**an**dbook **of** Labor Economics,

Volume 3 (Amsterdam: Elsevier, 1999).

——,L.LOCHNER, AND C. TABER, “Explaining Rising Wage Inequality: Explorations With

A Dynamic General Equilibrium Model **of** Earnings With Heterogeneous Agents,”

Review **of** Economic Dynamics, 1 (1998a), 1–58.

——, ——, AND ——, “General Equilibrium Treatment Effects: A Study **of** Tuition Policy,”

Americ**an** Economic Review, 88 (1998b), 381–86.

——, ——, AND ——, “Tax Policy **an**d Hum**an** Capital Formation,” Americ**an** Economic

Review, 88 (1998c), 293–97.

——, ——, AND ——, “General Equilibrium Cost Benefit Analysis **of** Education **an**d Tax

Policies,” in G. R**an**is **an**d L. K. Raut, eds., Trade, Growth **an**d Development: Essays in

Honor **of** T. N. Srinivas**an** (Amsterdam: Elsevier Science, B.V. 2000), 291–393, Chapter

14.

——, T.SMITH, AND N. CLEMENTS, “Making the Most out **of** Program Evaluations **an**d

Social Experiments: Accounting for Heterogeneity in Program Impacts,” Review **of**

Economic Studies 64 (1997), 487–535.

HOEFFDING,W.,“Masstabinvari**an**te Korrelationtheorie,” Schriften des Mathematischen Instituts

und des Instituts für Angew**an**dte Mathematik der Universität Berlin, 5(1940),

197–233.

JÖRESKOG, K., “Structural Equations Models in the Social Sciences: Specification, Estimation

**an**d Testing.” In **Application**s **of** Statistics, P. R. Krishnaih, ed. (Amsterdam: North

Holl**an**d, 1977), 265–87.

JÖRESKOG,K.G.,AND A. S. GOLDBERGER, “Estimation **of** a Model **with** Multiple Indicators

**an**d Multiple Causes **of** a Single Latent Variable,” Journal **of** the Americ**an** Statistical

Association, 70 (1975), 631–39.

KEANE, M., AND K. WOLPIN, “The Career Decisions **of** Young Men,” Journal **of** Political

Economy 105 (1997), 473–522.

KOTLARSKI, I.“On Characterizing the Gamma **an**d Normal Distribution,” Pacific Journal

**of** Mathematics 20 (1967), 69–76.

MANSKI,C.F.“Identification **of** Binary Response Models,” Journal **of** the Americ**an** Statistical

Association, 83 (1988), 729–38.

MATZKIN, R., “Nonparametric **an**d Distribution-Free Estimation **of** the Binary Threshold

Crossing **an**d the Binary Choice Models,” Econometrica 60, (1992): 239–70.

422 CARNEIRO, HANSEN, AND HECKMAN

——, “Nonparametric Identification **an**d Estimation **of** Polychotomous Choice Models,”

Journal **of** Econometrics 58, (1993) 137–68.

——, AND A. LEWBEL, “Notes on Single Index Restrictions,” Unpublished M**an**uscript,

Northwestern University, 2002.

MCFADDEN, D.,“Econometric Analysis **of** Qualitative Response Models,” in Z. Griliches

**an**d M. Intrilligator, eds. H**an**dbook **of** Econometrics, Volume II, (Amsterdam: North

Holl**an**d, 1984).

MUTHEN, B.,“A General Structural Equation Model With Dichotomous, Ordered Categorical

**an**d Continuous Latent Variable Indicators,” Psychometrika 49 (1984), 115–32.

OLLEY, S.G.,AND A. PAKES, “The Dynamics **of** Productivity in the Telecommunications

Equipment Industry,” Econometrica, 64 (1996), 1263–97.

RAO PRAKASA,B.L.S.,Identifiability in Stochastic Models: Characterization **of** Probability

**Distributions** (Boston: Academic Press, 1992).

RICHARDSON, S., L. LEBLOND,I.JAUSSENT, AND P. J. GREEN, “Mixture Models in Measurement

Error Problems, **with** Reference to Epidemiological Studies,” Working paper, 2000.

ROBERT,C.P.,AND G. CASELLA, Monte Carlo Statistical Methods (New York: Springer, 1999).

ROSENBAUM,P.,AND D. RUBIN, “The Central Role **of** the Propensity Score in Observational

Studies for Causal Effects,” Biometrika 70 (1983), 41–55.

ROY, A., “Some Thoughts on the Distribution **of** Earnings,” Oxford Economic Papers, 3

(1951), 135–46.

ROZANOV, Y.A.,Markov R**an**dom Fields (Berlin: Spring Verlag, 1982).

SEN, A.K.,On Economic Inequality (Oxford: Clarendon Press, 1973).

WILLIS, R., AND S. ROSEN, “Education **an**d Self-Selection,” Journal **of** Political Economy,

87 (1979), S7–S36.