Optimality
Optimality
Optimality
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
IMS Lecture Notes–Monograph Series<br />
2nd Lehmann Symposium – <strong>Optimality</strong><br />
Vol. 49 (2006) 98–119<br />
c○ Institute of Mathematical Statistics, 2006<br />
DOI: 10.1214/074921706000000419<br />
Where do statistical models come from?<br />
Revisiting the problem of specification<br />
Aris Spanos ∗1<br />
Virginia Polytechnic Institute and State University<br />
Abstract: R. A. Fisher founded modern statistical inference in 1922 and identified<br />
its fundamental problems to be: specification, estimation and distribution.<br />
Since then the problem of statistical model specification has received scant<br />
attention in the statistics literature. The paper traces the history of statistical<br />
model specification, focusing primarily on pioneers like Fisher, Neyman, and<br />
more recently Lehmann and Cox, and attempts a synthesis of their views in the<br />
context of the Probabilistic Reduction (PR) approach. As argued by Lehmann<br />
[11], a major stumbling block for a general approach to statistical model specification<br />
has been the delineation of the appropriate role for substantive subject<br />
matter information. The PR approach demarcates the interrelated but complemenatry<br />
roles of substantive and statistical information summarized ab initio<br />
in the form of a structural and a statistical model, respectively. In an attempt<br />
to preserve the integrity of both sources of information, as well as to ensure the<br />
reliability of their fusing, a purely probabilistic construal of statistical models<br />
is advocated. This probabilistic construal is then used to shed light on a<br />
number of issues relating to specification, including the role of preliminary<br />
data analysis, structural vs. statistical models, model specification vs. model<br />
selection, statistical vs. substantive adequacy and model validation.<br />
1. Introduction<br />
The current approach to statistics, interpreted broadly as ‘probability-based data<br />
modeling and inference’, has its roots going back to the early 19th century, but it<br />
was given its current formulation by R. A. Fisher [5]. He identified the fundamental<br />
problems of statistics to be: specification, estimation and distribution. Despite its<br />
importance, the question of specification, ‘where do statistical models come from?’<br />
received only scant attention in the statistics literature; see Lehmann [11].<br />
The cornerstone of modern statistics is the notion of a statistical model whose<br />
meaning and role have changed and evolved along with that of statistical modeling<br />
itself over the last two centuries. Adopting a retrospective view, a statistical model<br />
is defined to be an internally consistent set of probabilistic assumptions aiming to<br />
provide an ‘idealized’ probabilistic description of the stochastic mechanism that<br />
gave rise to the observed data x := (x1, x2, . . . , xn). The quintessential statistical<br />
model is the simple Normal model, comprising a statistical Generating Mechanism<br />
(GM):<br />
(1.1) Xk = µ + uk, k∈ N :={1,2, . . . n, . . .}<br />
∗ I’m most grateful to Erich Lehmann, Deborah G. Mayo, Javier Rojo and an anonymous<br />
referee for valuable suggestions and comments on an earlier draft of the paper.<br />
1 Department of Economics, Virginia Polytechnic Institute, and State University, Blacksburg,<br />
VA 24061, e-mail: aris@vt.edu<br />
AMS 2000 subject classifications: 62N-03, 62A01, 62J20, 60J65.<br />
Keywords and phrases: specification, statistical induction, misspecification testing, respecification,<br />
statistical adequacy, model validation, substantive vs. statistical information, structural vs.<br />
statistical models.<br />
98