11.07.2015 Views

2DkcTXceO

2DkcTXceO

2DkcTXceO

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

468 Targeted learningdenote the observed random variable on a unit with O and let P 0 be the probabilitydistribution of O. In addition, let’s assume that the observed data is arealization of n independent and identically distributed copies O 1 ,...,O n ofO ∼ P 0 . Formally, a statistical model is the collection of possible probabilitydistributions of O, and we denote this set with M.Contrary to most current practice, a statistical model should contain thetrue P 0 ,sothattheresultingestimationproblemisacorrectformulation,and not a biased approximation of the true estimation problem. The famousquote that all statistical models are wrong represents a false statement, sinceit is not hard to formulate truthful statistical models that only incorporatetrue knowledge, such as the nonparametric statistical model that makes noassumptions at all. Of course, we already made the key statistical assumptionthat the n random variables O 1 ,...,O n are independent and identicallydistributed, and, that assumption itself might need to be weakened to a statisticalmodel for (O 1 ,...,O n ) ∼ P0n that contains the true distribution P0 n . Fora historical and philosophical perspective on “models, inference, and truth,”we refer to Starmans (2012).40.2.2 The model encoding both statistical and non-testableassumptionsA statistical model could be represented as M = {P θ : θ ∈ Θ} for some mappingθ ↦→ P θ defined on an infinite-dimensional parameter space Θ. We referto this mapping θ ↦→ P θ as a model, and it implies the statistical model for thetrue data distribution P 0 = P θ0 . There will always exist many models that arecompatible with a particular statistical model. It is important to note that thestatistical model is the only relevant information for the statistical estimationproblem. Examples of models are censored data and causal inference modelsin which case the observed data structure O =Φ(C, X) is represented as amany to one mapping Φ from the full-data X and censoring variable C toO, in which case the observed data distribution is indexed by the full-datadistribution P X and censoring mechanism P C|X .SointhiscaseΘrepresentsthe set of possible (P X ,P C|X ), and P θ is the distribution of Φ(C, X) impliedby the distribution θ of (C, X). Different models for (P X ,P C|X )mightimplythe same statistical model for the data distribution of O. Wenotethata model encodes assumptions beyond the statistical model, and we refer tothese additional assumptions as non-testable assumptions since they put norestrictions on the distribution of the data. Assumptions such as O =Φ(C, X),and the coarsening or missing at random assumption on the conditional distributionP C|X are examples of non-testable assumptions that do not affectthe statistical model.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!