16.01.2015 Views

Multiple Imputation in Mplus

Multiple Imputation in Mplus

Multiple Imputation in Mplus

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

• Data set conta<strong>in</strong><strong>in</strong>g scores from 480 employees on eight workrelated<br />

variables<br />

• Variables: Age, gender, job tenure, IQ, psychological wellbe<strong>in</strong>g,<br />

job satisfaction, job performance, and turnover <strong>in</strong>tentions<br />

• 33% of the cases have miss<strong>in</strong>g well-be<strong>in</strong>g scores, and 33%<br />

have miss<strong>in</strong>g satisfaction scores<br />

• The mechanism is MCAR because the data are miss<strong>in</strong>g by<br />

design


• With some plann<strong>in</strong>g, the same set of imputations can serve as<br />

<strong>in</strong>put data for many different analyses<br />

• At a m<strong>in</strong>imum, the imputation process should <strong>in</strong>clude all effects<br />

that are of <strong>in</strong>terest <strong>in</strong> the subsequent analysis phase (e.g., ma<strong>in</strong><br />

effects, <strong>in</strong>teractions, non-l<strong>in</strong>ear relations)<br />

• <strong>Imputation</strong> should also <strong>in</strong>corporate auxiliary variables that<br />

predict miss<strong>in</strong>gness or are correlated with the analysis variables


• TITLE (optional)<br />

• DATA (same as ML analysis)<br />

• VARIABLE (same as ML analysis)<br />

• ANALYSIS<br />

• DATA IMPUTATION<br />

• OUTPUT


• Specification with full file path<br />

DATA:!<br />

! Location of the data file;!<br />

file = ‘c:\Data\employee.dat’;!<br />

• Simplified specification when the data file and the <strong>Mplus</strong> syntax<br />

file are located <strong>in</strong> the same folder<br />

DATA:!<br />

! Location of the data file;!<br />

file = employee.dat;!


• All variables listed on the USEVARIABLES command are <strong>in</strong>cluded<br />

<strong>in</strong> the imputation process<br />

• The USEVARIABLES list typically <strong>in</strong>cludes complete and<br />

<strong>in</strong>complete variables<br />

VARIABLE:!<br />

! Information about the contents of the data file;!<br />

names = id age tenure female wbe<strong>in</strong>g jobsat jobperf<br />

turnover iq;!<br />

usevariables = jobperf tenure wbe<strong>in</strong>g jobsat turnover iq;!<br />

miss<strong>in</strong>g = all (-99);!


• The follow<strong>in</strong>g commands apply to the f<strong>in</strong>al MCMC run that<br />

generates the imputed data sets<br />

ANALYSIS:!<br />

! Saturated imputation model;!<br />

type = basic;!<br />

! Random number seed for MCMC algorithm;!<br />

bseed = 48932;!<br />

! Convergence criterion (default = .05);!<br />

bconvergence = .05;!


• Specifies characteristics of the imputation algorithm<br />

• This command is unnecessary <strong>in</strong> the prelim<strong>in</strong>ary diagnostic run<br />

DATA IMPUTATION:!<br />

! Variables to be imputed; !<br />

impute = wbe<strong>in</strong>g jobsat;!<br />

! Number of imputed data sets;!<br />

ndatasets = 50;!<br />

! File name prefix for imputed data sets;!<br />

save = employeeimp*.dat;!<br />

! Between-imputation <strong>in</strong>terval;!<br />

th<strong>in</strong> = 300;!


• The TECH8 option of the OUTPUT command computes the PSR<br />

statistic after every 100 MCMC iterations and pr<strong>in</strong>ts the values<br />

to the output file<br />

OUTPUT:!<br />

! Tech8 gives the PSR statistic; !<br />

tech8;!


DATA:!<br />

file = employee.dat;!<br />

VARIABLE:!<br />

names = id age tenure female wbe<strong>in</strong>g jobsat jobperf turnover iq;!<br />

usevariables = jobperf tenure wbe<strong>in</strong>g jobsat turnover iq;!<br />

miss<strong>in</strong>g = all (-99);!<br />

ANALYSIS:!<br />

type = basic;!<br />

bseed = 48932;!<br />

bconvergence = .05;!<br />

DATA IMPUTATION:!<br />

impute = wbe<strong>in</strong>g jobsat;!<br />

ndatasets = 50;!<br />

save = employeeimp*.dat;!<br />

th<strong>in</strong> = 300;!<br />

OUTPUT:!<br />

tech8;!<br />

!<br />

!<br />

!


• Near the bottom of the output file, <strong>Mplus</strong> lists the variable order<br />

<strong>in</strong> the imputed data sets<br />

• Use this variable list for all subsequent analyses<br />

SAVEDATA INFORMATION!<br />

!<br />

Order of variables!<br />

!<br />

JOBPERF!<br />

TENURE!<br />

WBEING!<br />

JOBSAT!<br />

TURNOVER!<br />

IQ!


• <strong>Mplus</strong> saves each<br />

imputed data set to a<br />

separate file<br />

• The file names use the<br />

prefix specified <strong>in</strong> the<br />

SAVE command (e.g.,<br />

employeeimp1.dat,<br />

employeeimp2.dat,<br />

employeeimp3.dat,<br />

etc.)


• The imputation<br />

program also<br />

generates a list file<br />

that conta<strong>in</strong>s the file<br />

names of the imputed<br />

data sets (e.g.,<br />

employeeimplist.dat)<br />

• The list file serves as<br />

<strong>in</strong>put data for all<br />

subsequent analyses


• <strong>Mplus</strong> fully automates the analysis and pool<strong>in</strong>g phases<br />

• Analyz<strong>in</strong>g imputed data sets requires a small change to the<br />

DATA command, but the rema<strong>in</strong><strong>in</strong>g commands are identical to<br />

an ML analysis<br />

• There is no need to list the variances and covariances for<br />

<strong>in</strong>complete explanatory variables because these variables are<br />

now complete


DATA:!<br />

file = employeeimplist.dat;<br />

type = imputation;<br />

! List of imputation file names;!<br />

! <strong>Imputation</strong> data;!<br />

VARIABLE:!<br />

names = jobperf tenure wbe<strong>in</strong>g jobsat turnover iq;!<br />

usevariables = jobperf tenure wbe<strong>in</strong>g jobsat turnover;!<br />

miss<strong>in</strong>g = all (-99);!<br />

center<strong>in</strong>g = grandmean(tenure wbe<strong>in</strong>g jobsat);!<br />

ANALYSIS:!<br />

estimator = ml;!<br />

MODEL:!<br />

jobperf on wbe<strong>in</strong>g (b1);!<br />

jobperf on jobsat (b2);!<br />

jobperf on tenure (b3);!<br />

jobperf on turnover (b4);!<br />

MODEL TEST:!<br />

b1 = 0; b2 = 0; b3 = 0; b4 = 0;!<br />

OUTPUT:!<br />

standardized sampstat patterns;!<br />

!<br />

!<br />

!<br />

!


SAMPLE STATISTICS!<br />

!<br />

NOTE: These are average results over 50 data sets.!<br />

!<br />

!<br />

!<br />

SAMPLE STATISTICS!<br />

Means!<br />

JOBPERF TENURE WBEING JOBSAT TURNOVER!<br />

________ ________ ________ ________ ________!<br />

1 6.021 0.000 0.000 0.000 0.321!<br />

Covariances!<br />

JOBPERF TENURE WBEING JOBSAT TURNOVER!<br />

________ ________ ________ ________ ________!<br />

JOBPERF 1.570!<br />

TENURE 0.061 9.735!<br />

WBEING 0.661 0.565 1.377!<br />

JOBSAT 0.272 0.552 0.447 1.394!<br />

TURNOVER -0.203 0.016 -0.148 -0.129 0.218!


!<br />

!Correlations!<br />

JOBPERF TENURE WBEING JOBSAT TURNOVER!<br />

________ ________ ________ ________ ________!<br />

JOBPERF 1.000!<br />

TENURE 0.016 1.000!<br />

WBEING 0.450 0.154 1.000!<br />

JOBSAT 0.184 0.150 0.323 1.000!<br />

TURNOVER -0.346 0.011 -0.269 -0.235 1.000!


• The Wald statistic (a chi-square with 4 degrees of freedom) is<br />

ak<strong>in</strong> to the omnibus F test <strong>in</strong> OLS regression<br />

Wald Test of Parameter Constra<strong>in</strong>ts!<br />

!<br />

Value 177.808!<br />

Degrees of Freedom 4!<br />

P-Value 0.0000!<br />

• The significant chi-square, χ 2 (4)= 177.808, p < .001, <strong>in</strong>dicates<br />

that the set of predictors expla<strong>in</strong> significant variation <strong>in</strong> the<br />

dependent variable


!<br />

!<br />

MODEL RESULTS!<br />

!<br />

Unstandardized<br />

Coefficients<br />

Standard<br />

Error<br />

z Test<br />

Two-Tailed!<br />

Estimate S.E. Est./S.E. P-Value!<br />

!<br />

JOBPERF ON!<br />

WBEING 0.417 0.057 7.366 0.000!<br />

JOBSAT 0.009 0.053 0.170 0.865!<br />

TENURE -0.017 0.017 -1.034 0.301!<br />

TURNOVER -0.640 0.116 -5.522 0.000!<br />

!<br />

Intercepts!<br />

JOBPERF 6.226 0.062 100.981 0.000!


• Because the cont<strong>in</strong>uous are centered at their means, the<br />

<strong>in</strong>tercept estimate (B 0 = 6.226) represents the adjusted mean<br />

for the group of employees that <strong>in</strong>tend to stay on the job<br />

(TURNOVER = 0)<br />

• Controll<strong>in</strong>g for other variables, employees that <strong>in</strong>tend to quit<br />

(TURNOVER = 1) have a .640 lower job performance mean (B 4<br />

= -.640, p < .001)<br />

• Hold<strong>in</strong>g other variables constant, one-po<strong>in</strong>t <strong>in</strong>crease <strong>in</strong> wellbe<strong>in</strong>g<br />

would produce a .417 <strong>in</strong>crease <strong>in</strong> job performance, on<br />

average (B 1 = .417, p < .001)


!<br />

STANDARDIZED MODEL RESULTS!<br />

!<br />

!<br />

STDYX Standardization!<br />

!<br />

JOBPERF ON!<br />

Beta<br />

Weights<br />

Two-Tailed!<br />

Estimate S.E. Est./S.E. P-Value!<br />

WBEING 0.390 0.051 7.720 0.000!<br />

JOBSAT 0.009 0.050 0.170 0.865!<br />

TENURE -0.043 0.042 -1.035 0.301!<br />

TURNOVER -0.238 0.042 -5.631 0.000!


!<br />

R-SQUARE!<br />

!<br />

!<br />

Observed<br />

Two-Tailed!<br />

Variable Estimate S.E. Est./S.E. P-Value!<br />

JOBPERF 0.260 0.040 6.568 0.000!


• Prior to perform<strong>in</strong>g the analyses, we used multiple imputation to<br />

deal with the miss<strong>in</strong>g data. Briefly, multiple imputation uses a<br />

regression-based procedure to generate multiple copies of the<br />

data set, each of which conta<strong>in</strong>s different estimates of the miss<strong>in</strong>g<br />

values. We used the fully conditional specification algorithm <strong>in</strong> the<br />

SPSS multiple imputation procedure to generate 50 imputed data<br />

sets. An exploratory analysis suggested that the data sets should<br />

be separated by at least 100 iterations, so we took a conservative<br />

approach of sav<strong>in</strong>g a data set after every 300 th computational<br />

cycle. The imputation model <strong>in</strong>cluded the five regression model<br />

parameters and IQ scores.


• After creat<strong>in</strong>g the complete data sets, we estimated the multiple<br />

regression model on each filled-<strong>in</strong> data set and subsequently<br />

used Rub<strong>in</strong>’s (1987) formulas to comb<strong>in</strong>e the parameter<br />

estimates and standard errors <strong>in</strong>to a s<strong>in</strong>gle set of results. Note<br />

that methodologists currently regard multiple imputation as a<br />

“state of the art” miss<strong>in</strong>g data technique (Schafer & Graham,<br />

2002) because it requires less strict assumptions about the<br />

mechanism that led to miss<strong>in</strong>g data and generally produces<br />

more accurate estimates than traditional miss<strong>in</strong>g data handl<strong>in</strong>g<br />

techniques (e.g., discard<strong>in</strong>g cases).


• Considered as a set, the four predictors expla<strong>in</strong>ed approximately<br />

26% of the variability <strong>in</strong> job performance scores, R 2 = .26. Table 1<br />

gives the regression coefficients from the analysis. Because the<br />

cont<strong>in</strong>uous predictors were centered at their means, the <strong>in</strong>tercept<br />

quantifies the average job performance rat<strong>in</strong>g for employees that<br />

<strong>in</strong>tend to stay <strong>in</strong> their current position. As seen <strong>in</strong> the table,<br />

psychological well-be<strong>in</strong>g was a significant predictors of job<br />

performance, such that one-po<strong>in</strong>t <strong>in</strong>crease <strong>in</strong> well-be<strong>in</strong>g scores was<br />

associated with a .417 <strong>in</strong>crease <strong>in</strong> job performance, controll<strong>in</strong>g for<br />

other predictors, z = 7.336, p < .001. Turnover <strong>in</strong>tentions was also a<br />

significant unique predictor, such that employees with <strong>in</strong>tentions to quit<br />

had a .64 lower job performance average after controll<strong>in</strong>g for other<br />

predictors, t = -5.522, p < .001.


• Tabular presentations of miss<strong>in</strong>g data analyses are identical to<br />

those from a complete-data analysis<br />

Table 1<br />

<strong>Multiple</strong> Regression Parameter Estimates<br />

Effect Est. Beta SE z p<br />

Intercept 6.226 N/A .062 100.981 < .001<br />

Well-Be<strong>in</strong>g .417 .390 .057 7.336 < .001<br />

Job Satisfaction .009 .009 .053 .170 .865<br />

Job Tenure -.017 -.043 .017 -1.034 .301<br />

Turnover Intentions -.640 -2.238 .116 -5.522 < .001

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!