11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

546 NP-hard inferencelarge to pass on, or too confidential to let the user see everything, or all ofthe above! Examples range from microarrays to astrophysics; see Blocker andMeng (2013).“So what?” Some may argue that all this can be captured by our modelPr(Y |θ), at least in theory, if we have made enough effort to learn about theentire process. Putting aside the impossibility of learning about everythingin practice (Blocker and Meng, 2013), we will see that the single-model formulationis simply not rich enough to capture reality, even if we assume thatevery pre-processor and analyst have done everything correctly. The troublehere is that pre-processors and analysts have different goals, have access todifferent data resources, and make different assumptions. They typically donot and cannot communicate with each other, resulting in separate (model)assumptions that no single probabilistic model can coherently encapsulate.We need a multiplicity of models to capture a multiplicity of incompatibleassumptions.45.3.1 Multiple imputation and uncongenialityIlearnedaboutthesecomplicationsduringmystudyofthemultipleimputation(MI) method (Rubin, 1987), where the pre-processor is the imputer.The imputer’s goal was to preserve as much as possible in the imputed datathe joint distributional properties of the original complete data (assuming, ofcourse, the original complete-data samples were scientifically designed so thattheir properties are worthy of preservation). For that purpose, the imputershould and will use anything that can help, including confidential information,as well as powerful predictive models that may not capture the correctcausal relations.In addition, because the imputed data typically will be used for manypurposes, most of which cannot be anticipated at the time of imputation, theimputation model needs to include as many predictors as possible, and be assaturated as the data and resources permit; see Meng (1994) and Rubin (1996).In contrast, an analysis model, or rather an approach (e.g., given by software),often focuses on specific questions and may involve only a (small) subset ofthe variables used by the imputer. Consequently, the imputer’s model and theuser’s procedure may be uncongenial to each other, meaning that no modelcan be compatible with both the imputer’s model and the user’s procedure.The technical definitions of congeniality are given in Meng (1994) and Xieand Meng (2013), which involve embedding an analyst’s procedure (often offrequentist nature) into an imputation model (typically with Bayesian flavor).For the purposes of the following discussion, two models are “congenial” iftheir implied imputation and analysis procedures are the same. That is, theyare operationally, though perhaps not theoretically, equivalent.Ironically, the original motivation of MI (Rubin, 1987) was a separationof labor, asking those who have more knowledge and resources (e.g., the USCensus Bureau) to fix/impute the missing observations, with the hope that

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!