12.01.2015 Views

Econometric Analysis of Cross Section and Panel Data - Free

Econometric Analysis of Cross Section and Panel Data - Free

Econometric Analysis of Cross Section and Panel Data - Free

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

17 Sample Selection, Attrition, <strong>and</strong> Stratified Sampling<br />

17.1 Introduction<br />

Up to this point, with the exception <strong>of</strong> occasionally touching on cluster samples <strong>and</strong><br />

independently pooled cross sections, we have assumed the availability <strong>of</strong> a r<strong>and</strong>om<br />

sample from the underlying population. This assumption is not always realistic: because<br />

<strong>of</strong> the way some economic data sets are collected, <strong>and</strong> <strong>of</strong>ten because <strong>of</strong> the<br />

behavior <strong>of</strong> the units being sampled, r<strong>and</strong>om samples are not always available.<br />

A selected sample is a general term that describes a nonr<strong>and</strong>om sample. There are a<br />

variety <strong>of</strong> selection mechanisms that result in nonr<strong>and</strong>om samples. Some <strong>of</strong> these are<br />

due to sample design, while others are due to the behavior <strong>of</strong> the units being sampled,<br />

including nonresponse on survey questions <strong>and</strong> attrition from social programs.<br />

Before we launch into specifics, there is an important general point to remember:<br />

sample selection can only be an issue once the population <strong>of</strong> interest has been carefully<br />

specified. If we are interested in a subset <strong>of</strong> a larger population, then the proper<br />

approach is to specify a model for that part <strong>of</strong> the population, obtain a r<strong>and</strong>om<br />

sample from that part <strong>of</strong> the population, <strong>and</strong> proceed with st<strong>and</strong>ard econometric<br />

methods.<br />

The following are some examples with nonr<strong>and</strong>omly selected samples.<br />

Example 17.1 (Saving Function): Suppose we wish to estimate a saving function for<br />

all families in a given country, <strong>and</strong> the population saving function is<br />

saving ¼ b 0 þ b 1 income þ b 2 age þ b 3 married þ b 4 kids þ u<br />

ð17:1Þ<br />

where age is the age <strong>of</strong> the household head <strong>and</strong> the other variables are self-explanatory.<br />

However, we only have access to a survey that included families whose household<br />

head was 45 years <strong>of</strong> age or older. This limitation raises a sample selection issue because<br />

we are interested in the saving function for all families, but we can obtain a<br />

r<strong>and</strong>om sample only for a subset <strong>of</strong> the population.<br />

Example 17.2 (Truncation Based on Wealth): We are interested in estimating the<br />

e¤ect <strong>of</strong> worker eligibility in a particular pension plan [for example, a 401(k) plan] on<br />

family wealth. Let the population model be<br />

wealth ¼ b 0 þ b 1 plan þ b 2 educ þ b 3 age þ b 4 income þ u<br />

ð17:2Þ<br />

where plan is a binary indicator for eligibility in the pension plan. However, we can<br />

only sample people with a net wealth less than $200,000, so the sample is selected on<br />

the basis <strong>of</strong> wealth. As we will see, sampling based on a response variable is much<br />

more serious than sampling based on an exogenous explanatory variable.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!