21.06.2014 Views

Questionnaire Dwelling Unit-Level and Person Pair-Level Sampling ...

Questionnaire Dwelling Unit-Level and Person Pair-Level Sampling ...

Questionnaire Dwelling Unit-Level and Person Pair-Level Sampling ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Appendix N: Univariate <strong>and</strong> Multivariate Predictive Mean<br />

Neighborhood Imputation Methods<br />

N.1 Introduction<br />

Since the introduction of the computer-assisted interviewing (CAI) method in 1999 for<br />

the National Survey on Drug Use <strong>and</strong> Health (NSDUH), 29 one imputation method has been used<br />

for most variables requiring imputation: predictive mean neighborhood (PMN). It was developed<br />

to cater to the specific needs of NSDUH. This approach has been used since the 1999 survey 30<br />

<strong>and</strong> can be applied to one variable at a time or to several variables simultaneously. As described<br />

in this appendix, PMN incorporates predicted means from models <strong>and</strong> the assignment of imputed<br />

values using neighborhoods determined by those predicted means.<br />

N.2 Overview<br />

N.2.1 Predictive Mean Neighborhood Method: Derived from Combining Nearest<br />

Neighbor Hot Deck <strong>and</strong> Predictive Mean Matching<br />

The PMN method is a combination of two commonly used imputation methods: a<br />

nonmodel-based hot deck (nearest neighbor) <strong>and</strong> a modification of the model-assisted predictive<br />

mean matching (PMM) method of Rubin (1986). The PMN method enhances the PMM method.<br />

Specifically, the PMN method can be applied to both discrete <strong>and</strong> continuous variables, either<br />

individually or jointly. The PMN method also enhances the nearest neighbor hot-deck (NNHD)<br />

method so that the distance function used to find neighbors is no longer ad hoc.<br />

A commonly used imputation method is a r<strong>and</strong>om NNHD (Little & Rubin, 1987, p. 65).<br />

With this method, donors <strong>and</strong> recipients are distinguished by the completeness of their records<br />

with regard to the variable(s) of interest. (The donor has complete data, but the recipient does<br />

not.) A donor set deemed close to the recipient with respect to a number of covariates is used to<br />

select a donor at r<strong>and</strong>om. For NSDUH, the set of covariates typically included demographic<br />

variables, as well as some other nonmissing pair-level variables. In the case of NSDUH, to<br />

further ensure that a donor matched the recipient as closely as possible, discrete variables (or<br />

discrete categories of continuous variables) strongly correlated with the response variables of<br />

interest were often used to restrict the set of donors. Furthermore, other restrictions involving<br />

outcome variables were imposed on the neighborhood.<br />

Note that in NNHD, unlike sequential hot deck, a distance function is used to define<br />

closeness between the recipient <strong>and</strong> a donor. So, there is less of a problem of sparseness of the<br />

donor class, but the distance function involving categorical or nominal variables is typically ad<br />

hoc <strong>and</strong> often hard to justify.<br />

29 This report presents information from the 2006 National Survey on Drug Use <strong>and</strong> Health (NSDUH), an<br />

annual survey of the civilian, noninstitutionalized population of the <strong>Unit</strong>ed States aged 12 years or older. Prior to<br />

2002, the survey was called the National Household Survey on Drug Abuse (NHSDA).<br />

30 After the 1999 survey, only a CAI sample was selected.<br />

N-3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!