Multiple Imputation in Mplus

• Data set containing scores from 480 employees on eight workrelated 

variables 

• Variables: Age, gender, job tenure, IQ, psychological wellbeing, 

job satisfaction, job performance, and turnover intentions 

• 33% of the cases have missing well-being scores, and 33% 

have missing satisfaction scores 

• The mechanism is MCAR because the data are missing by 

design

• With some planning, the same set of imputations can serve as 

input data for many different analyses 

• At a minimum, the imputation process should include all effects 

that are of interest in the subsequent analysis phase (e.g., main 

effects, interactions, non-linear relations) 

• Imputation should also incorporate auxiliary variables that 

predict missingness or are correlated with the analysis variables

• TITLE (optional) 

• DATA (same as ML analysis) 

• VARIABLE (same as ML analysis) 

• ANALYSIS 

• DATA IMPUTATION 

• OUTPUT

• Specification with full file path 

DATA:! 

! Location of the data file;! 

file = ‘c:\Data\employee.dat’;! 

• Simplified specification when the data file and the Mplus syntax 

file are located in the same folder 

DATA:! 

! Location of the data file;! 

file = employee.dat;!

• All variables listed on the USEVARIABLES command are included 

in the imputation process 

• The USEVARIABLES list typically includes complete and 

incomplete variables 

VARIABLE:! 

! Information about the contents of the data file;! 

names = id age tenure female wbeing jobsat jobperf 

turnover iq;! 

usevariables = jobperf tenure wbeing jobsat turnover iq;! 

missing = all (-99);!

• The following commands apply to the final MCMC run that 

generates the imputed data sets 

ANALYSIS:! 

! Saturated imputation model;! 

type = basic;! 

! Random number seed for MCMC algorithm;! 

bseed = 48932;! 

! Convergence criterion (default = .05);! 

bconvergence = .05;!

• Specifies characteristics of the imputation algorithm 

• This command is unnecessary in the preliminary diagnostic run 

DATA IMPUTATION:! 

! Variables to be imputed; ! 

impute = wbeing jobsat;! 

! Number of imputed data sets;! 

ndatasets = 50;! 

! File name prefix for imputed data sets;! 

save = employeeimp*.dat;! 

! Between-imputation interval;! 

thin = 300;!

• The TECH8 option of the OUTPUT command computes the PSR 

statistic after every 100 MCMC iterations and prints the values 

to the output file 

OUTPUT:! 

! Tech8 gives the PSR statistic; ! 

tech8;!

DATA:! 

file = employee.dat;! 

VARIABLE:! 

names = id age tenure female wbeing jobsat jobperf turnover iq;! 

usevariables = jobperf tenure wbeing jobsat turnover iq;! 

missing = all (-99);! 

ANALYSIS:! 

type = basic;! 

bseed = 48932;! 

bconvergence = .05;! 

DATA IMPUTATION:! 

impute = wbeing jobsat;! 

ndatasets = 50;! 

save = employeeimp*.dat;! 

thin = 300;! 

OUTPUT:! 

tech8;! 

! 

! 

!

• Near the bottom of the output file, Mplus lists the variable order 

in the imputed data sets 

• Use this variable list for all subsequent analyses 

SAVEDATA INFORMATION! 

! 

Order of variables! 

! 

JOBPERF! 

TENURE! 

WBEING! 

JOBSAT! 

TURNOVER! 

IQ!

• Mplus saves each 

imputed data set to a 

separate file 

• The file names use the 

prefix specified in the 

SAVE command (e.g., 

employeeimp1.dat, 



etc.)

• The imputation 

program also 

generates a list file 

that contains the file 

names of the imputed 

data sets (e.g., 

employeeimplist.dat) 

• The list file serves as 

input data for all 

subsequent analyses

• Mplus fully automates the analysis and pooling phases 

• Analyzing imputed data sets requires a small change to the 

DATA command, but the remaining commands are identical to 

an ML analysis 

• There is no need to list the variances and covariances for 

incomplete explanatory variables because these variables are 

now complete

DATA:! 

file = employeeimplist.dat; 

type = imputation; 

! List of imputation file names;! 

! Imputation data;! 

VARIABLE:! 

names = jobperf tenure wbeing jobsat turnover iq;! 

usevariables = jobperf tenure wbeing jobsat turnover;! 

missing = all (-99);! 

centering = grandmean(tenure wbeing jobsat);! 

ANALYSIS:! 

estimator = ml;! 

MODEL:! 

jobperf on wbeing (b1);! 

jobperf on jobsat (b2);! 

jobperf on tenure (b3);! 

jobperf on turnover (b4);! 

MODEL TEST:! 

b1 = 0; b2 = 0; b3 = 0; b4 = 0;! 

OUTPUT:! 

standardized sampstat patterns;! 

! 

! 

! 

!

SAMPLE STATISTICS! 

! 

NOTE: These are average results over 50 data sets.! 

! 

! 

! 

SAMPLE STATISTICS! 

Means! 

JOBPERF TENURE WBEING JOBSAT TURNOVER! 

________ ________ ________ ________ ________! 

1 6.021 0.000 0.000 0.000 0.321! 

Covariances! 


________ ________ ________ ________ ________! 

JOBPERF 1.570! 

TENURE 0.061 9.735! 

WBEING 0.661 0.565 1.377! 

JOBSAT 0.272 0.552 0.447 1.394! 

TURNOVER -0.203 0.016 -0.148 -0.129 0.218!

! 

!Correlations! 


________ ________ ________ ________ ________! 

JOBPERF 1.000! 

TENURE 0.016 1.000! 

WBEING 0.450 0.154 1.000! 

JOBSAT 0.184 0.150 0.323 1.000! 

TURNOVER -0.346 0.011 -0.269 -0.235 1.000!

• The Wald statistic (a chi-square with 4 degrees of freedom) is 

akin to the omnibus F test in OLS regression 

Wald Test of Parameter Constraints! 

! 

Value 177.808! 

Degrees of Freedom 4! 

P-Value 0.0000! 

• The significant chi-square, χ 2 (4)= 177.808, p < .001, indicates 

that the set of predictors explain significant variation in the 

dependent variable

! 

! 

MODEL RESULTS! 

! 

Unstandardized 

Coefficients 

Standard 

Error 

z Test 

Two-Tailed! 

Estimate S.E. Est./S.E. P-Value! 

! 

JOBPERF ON! 

WBEING 0.417 0.057 7.366 0.000! 

JOBSAT 0.009 0.053 0.170 0.865! 

TENURE -0.017 0.017 -1.034 0.301! 

TURNOVER -0.640 0.116 -5.522 0.000! 

! 

Intercepts! 

JOBPERF 6.226 0.062 100.981 0.000!

• Because the continuous are centered at their means, the 

intercept estimate (B 0 = 6.226) represents the adjusted mean 

for the group of employees that intend to stay on the job 

(TURNOVER = 0) 

• Controlling for other variables, employees that intend to quit 

(TURNOVER = 1) have a .640 lower job performance mean (B 4 

= -.640, p < .001) 

• Holding other variables constant, one-point increase in wellbeing 

would produce a .417 increase in job performance, on 

average (B 1 = .417, p < .001)

! 

STANDARDIZED MODEL RESULTS! 

! 

! 

STDYX Standardization! 

! 

JOBPERF ON! 

Beta 

Weights 

Two-Tailed! 

Estimate S.E. Est./S.E. P-Value! 

WBEING 0.390 0.051 7.720 0.000! 

JOBSAT 0.009 0.050 0.170 0.865! 

TENURE -0.043 0.042 -1.035 0.301! 

TURNOVER -0.238 0.042 -5.631 0.000!

! 

R-SQUARE! 

! 

! 

Observed 

Two-Tailed! 

Variable Estimate S.E. Est./S.E. P-Value! 

JOBPERF 0.260 0.040 6.568 0.000!

• Prior to performing the analyses, we used multiple imputation to 

deal with the missing data. Briefly, multiple imputation uses a 

regression-based procedure to generate multiple copies of the 

data set, each of which contains different estimates of the missing 

values. We used the fully conditional specification algorithm in the 

SPSS multiple imputation procedure to generate 50 imputed data 

sets. An exploratory analysis suggested that the data sets should 

be separated by at least 100 iterations, so we took a conservative 

approach of saving a data set after every 300 th computational 

cycle. The imputation model included the five regression model 

parameters and IQ scores.

• After creating the complete data sets, we estimated the multiple 

regression model on each filled-in data set and subsequently 

used Rubin’s (1987) formulas to combine the parameter 

estimates and standard errors into a single set of results. Note 

that methodologists currently regard multiple imputation as a 

“state of the art” missing data technique (Schafer & Graham, 

2002) because it requires less strict assumptions about the 

mechanism that led to missing data and generally produces 

more accurate estimates than traditional missing data handling 

techniques (e.g., discarding cases).

• Considered as a set, the four predictors explained approximately 

26% of the variability in job performance scores, R 2 = .26. Table 1 

gives the regression coefficients from the analysis. Because the 

continuous predictors were centered at their means, the intercept 

quantifies the average job performance rating for employees that 

intend to stay in their current position. As seen in the table, 

psychological well-being was a significant predictors of job 

performance, such that one-point increase in well-being scores was 

associated with a .417 increase in job performance, controlling for 

other predictors, z = 7.336, p < .001. Turnover intentions was also a 

significant unique predictor, such that employees with intentions to quit 

had a .64 lower job performance average after controlling for other 

predictors, t = -5.522, p < .001.

• Tabular presentations of missing data analyses are identical to 

those from a complete-data analysis 

Table 1 

Multiple Regression Parameter Estimates 

Effect Est. Beta SE z p 

Intercept 6.226 N/A .062 100.981 < .001 

Well-Being .417 .390 .057 7.336 < .001 

Job Satisfaction .009 .009 .053 .170 .865 

Job Tenure -.017 -.043 .017 -1.034 .301 

Turnover Intentions -.640 -2.238 .116 -5.522 < .001

Multiple Imputation in Mplus

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?