Questionnaire Dwelling Unit-Level and Person Pair-Level Sampling ...
Questionnaire Dwelling Unit-Level and Person Pair-Level Sampling ...
Questionnaire Dwelling Unit-Level and Person Pair-Level Sampling ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Appendix N: Univariate <strong>and</strong> Multivariate Predictive Mean<br />
Neighborhood Imputation Methods<br />
N.1 Introduction<br />
Since the introduction of the computer-assisted interviewing (CAI) method in 1999 for<br />
the National Survey on Drug Use <strong>and</strong> Health (NSDUH), 29 one imputation method has been used<br />
for most variables requiring imputation: predictive mean neighborhood (PMN). It was developed<br />
to cater to the specific needs of NSDUH. This approach has been used since the 1999 survey 30<br />
<strong>and</strong> can be applied to one variable at a time or to several variables simultaneously. As described<br />
in this appendix, PMN incorporates predicted means from models <strong>and</strong> the assignment of imputed<br />
values using neighborhoods determined by those predicted means.<br />
N.2 Overview<br />
N.2.1 Predictive Mean Neighborhood Method: Derived from Combining Nearest<br />
Neighbor Hot Deck <strong>and</strong> Predictive Mean Matching<br />
The PMN method is a combination of two commonly used imputation methods: a<br />
nonmodel-based hot deck (nearest neighbor) <strong>and</strong> a modification of the model-assisted predictive<br />
mean matching (PMM) method of Rubin (1986). The PMN method enhances the PMM method.<br />
Specifically, the PMN method can be applied to both discrete <strong>and</strong> continuous variables, either<br />
individually or jointly. The PMN method also enhances the nearest neighbor hot-deck (NNHD)<br />
method so that the distance function used to find neighbors is no longer ad hoc.<br />
A commonly used imputation method is a r<strong>and</strong>om NNHD (Little & Rubin, 1987, p. 65).<br />
With this method, donors <strong>and</strong> recipients are distinguished by the completeness of their records<br />
with regard to the variable(s) of interest. (The donor has complete data, but the recipient does<br />
not.) A donor set deemed close to the recipient with respect to a number of covariates is used to<br />
select a donor at r<strong>and</strong>om. For NSDUH, the set of covariates typically included demographic<br />
variables, as well as some other nonmissing pair-level variables. In the case of NSDUH, to<br />
further ensure that a donor matched the recipient as closely as possible, discrete variables (or<br />
discrete categories of continuous variables) strongly correlated with the response variables of<br />
interest were often used to restrict the set of donors. Furthermore, other restrictions involving<br />
outcome variables were imposed on the neighborhood.<br />
Note that in NNHD, unlike sequential hot deck, a distance function is used to define<br />
closeness between the recipient <strong>and</strong> a donor. So, there is less of a problem of sparseness of the<br />
donor class, but the distance function involving categorical or nominal variables is typically ad<br />
hoc <strong>and</strong> often hard to justify.<br />
29 This report presents information from the 2006 National Survey on Drug Use <strong>and</strong> Health (NSDUH), an<br />
annual survey of the civilian, noninstitutionalized population of the <strong>Unit</strong>ed States aged 12 years or older. Prior to<br />
2002, the survey was called the National Household Survey on Drug Abuse (NHSDA).<br />
30 After the 1999 survey, only a CAI sample was selected.<br />
N-3