Optimality
Optimality
Optimality
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
90 D. G. Mayo and D. R. Cox<br />
some theory being contemplated for general applicability, consistency with the null<br />
hypothesis may be regarded as some additional evidence for the theory, especially<br />
if the test and data are sufficiently sensitive to exclude major departures from the<br />
theory. An aspect is encapsulated in Fisher’s aphorism (Cochran [3]) that to help<br />
make observational studies more nearly bear a causal interpretation, one should<br />
make one’s theories elaborate, by which he meant one should plan a variety of<br />
tests of different consequences of a theory, to obtain a comprehensive check of its<br />
implications. The limited result that one set of data accords with the theory adds<br />
one piece to the evidence whose weight stems from accumulating an ability to refute<br />
alternative explanations.<br />
In the first type of example under this rubric, there may be apparently anomalous<br />
results for a theory or hypothesisT , whereT has successfully passed appreciable<br />
theoretical and/or empirical scrutiny. Were the apparently anomalous results forT<br />
genuine, it is expected that H0 will be rejected, so that when it is not, the results<br />
are positive evidence against the reality of the anomaly. In a second type of case,<br />
one again has a well-tested theoryT,and a rival theoryT ∗ is determined to conflict<br />
withT in a thus far untested domain, with respect to an effect. By identifying the<br />
null with the prediction fromT , any discrepancies in the direction ofT ∗ are given<br />
a very good chance to be detected, such that, if no significant departure is found,<br />
this constitutes evidence forT in the respect tested.<br />
Although the general theory of relativity, GTR, was not facing anomalies in the<br />
1960s, rivals to the GTR predicted a breakdown of the Weak Equivalence Principle<br />
for massive self-gravitating bodies, e.g., the earth-moon system: this effect, called<br />
the Nordvedt effect would be 0 for GTR (identified with the null hypothesis) and<br />
non-0 for rivals. Measurements of the round trip travel times between the earth and<br />
moon (between 1969 and 1975) enabled the existence of such an anomaly for GTR<br />
to be probed. Finding no evidence against the null hypothesis set upper bounds to<br />
the possible violation of the WEP, and because the tests were sufficiently sensitive,<br />
these measurements provided good evidence that the Nordvedt effect is absent, and<br />
thus evidence for the null hypothesis (Will [31]). Note that such a negative result<br />
does not provide evidence for all of GTR (in all its areas of prediction), but it does<br />
provide evidence for its correctness with respect to this effect. The logic is this:<br />
theoryT predicts H0 is at least a very close approximation to the true situation;<br />
rival theoryT ∗ predicts a specified discrepancy from H0, and the test has high<br />
probability of detecting such a discrepancy fromT wereT ∗ correct. Detecting no<br />
discrepancy is thus evidence for its absence.<br />
3.6. Confidence intervals<br />
As noted above in many problems the provision of confidence intervals, in principle<br />
at a range of probability levels, gives the most productive frequentist analysis. If<br />
so, then confidence interval analysis should also fall under our general frequentist<br />
principle. It does. In one sided testing of µ = µ0 against µ > µ0, a small p-value<br />
corresponds to µ0 being (just) excluded from the corresponding (1−2p) (two-sided)<br />
confidence interval (or 1−p for the one-sided interval). Were µ = µL, the lower<br />
confidence bound, then a less discordant result would occur with high probability<br />
(1−p). Thus FEV licenses taking this as evidence of inconsistency with µ = µL (in<br />
the positive direction). Moreover, this reasoning shows the advantage of considering<br />
several confidence intervals at a range of levels, rather than just reporting whether<br />
or not a given parameter value is within the interval at a fixed confidence level.