Optimality
Optimality
Optimality
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Massive multiple hypotheses testing 63<br />
The factor α0 serves asymptotically as a calibrator of the adaptive significance<br />
threshold to the Bonferroni threshold in the least favorable scenario π0 = 1, i.e., all<br />
null hypotheses are true. Analysis of the asymptotic ERR of the HT(α ∗ cal ) procedure<br />
suggests a few choices of α0 in practice.<br />
4.2. Asymptotic ERR of HT(α ∗ cal )<br />
Recall from (2.7) that<br />
ERR(α) = � π0α � Fm(α) � Pr(P1:m≤ α).<br />
The probability Pr(P1:m ≤ α) is not tractable in general, but an upper bound<br />
can be obtained under a reasonable assumption on the set Pm of the m P values.<br />
Massive multiple tests are mostly applied in exploratory studies to produce<br />
“inference-guided discoveries” that are either subject to further confirmation and<br />
validation, or helpful for developing new research hypotheses. For this reason often<br />
all the alternative hypotheses are two-sided, and hence so are the tests. It is instructive<br />
to first consider the case of m two-sample t tests. Conceptually the data<br />
consist of n1 i.i.d. observations on R m Xi = [Xi1, Xi2, . . . , Xim], i = 1, . . . , n1 in<br />
the first group, and n2 i.i.d. observations Yi = [Yi1, Yi2, . . . , Yim], i = 1, . . . , n2 in<br />
the second group. The hypothesis pair (H0k, HAk) is tested by the two-sided twosample<br />
t statistic Tk =|T(Xk,Yk, n1, n2)| based on the dataXk ={X1k, . . . , Xn1k}<br />
andYk ={Y1k, . . . , Yn2k}. Often in biological applications that study gene signaling<br />
pathways (see e.g., Kuo et al. [18], and the simulation model in Section<br />
5), Xik and Xik ′ (i = 1, . . . , n1) are either positively or negatively correlated<br />
for certain k �= k ′ , and the same holds for Yik and Yik ′ (i = 1, . . . , n2). Such<br />
dependence in data raises positive association between the two-sided test statis-<br />
tics Tk and Tk ′ so that Pr(Tk ≤ t|T ′ k ≤ t) ≥ Pr(Tk ≤ t), implying Pr(Tk ≤<br />
t, Tk ′ ≤ t)≥Pr(Tk ≤ t)Pr(Tk ′ ≤ t), t≥0. Then the P values in turn satisfy<br />
Pr(Pk > α, Pk ′ > α)≥Pr(Pk > α)Pr(Pk ′ > α), α∈[0,1]. It is straightforward to<br />
generalize this type of dependency to more than two tests. Alternatively, a direct<br />
model for the P values can be constructed.<br />
Example 4.1. LetJ ⊆{1, . . . , m} be a nonempty set of indices. Assume Pj =<br />
P Xj<br />
0 , j∈J , where P0 follows a distribution F0 on [0, 1], and Xj’s are i.i.d. continuous<br />
random variables following a distribution H on [0,∞), and are independent<br />
of the P values. Assume that the Pi’s for i�∈J are either independent or related to<br />
each other in the same fashion. This model mimics the effect of an activated gene<br />
signaling pathway that results in gene differential expression as reflected by the P<br />
values: the setJ represents the genes involved in the pathway, P0 represents the<br />
underlying activation mechanism, and Xj represents the noisy response of gene j<br />
resulting in Pj. Because Pi > α if and only if Xj < log α � log P0, direct calculations<br />
using independence of the Xj’s show that<br />
⎛<br />
Pr⎝<br />
�<br />
⎞ ⎛<br />
� 1<br />
{Pj >α} ⎠= Pr⎝<br />
�<br />
� �<br />
log α<br />
Xj <<br />
log t<br />
⎞ �� � �� �<br />
|J |<br />
⎠dF0(t)=E<br />
log α<br />
H<br />
,<br />
log P0<br />
j∈J<br />
0<br />
j∈J<br />
where|J| is the cardinalityJ . Next<br />
�<br />
Pr(Pj > α) = �<br />
j∈J<br />
j∈J<br />
� 1<br />
0<br />
�<br />
H<br />
� �� � � � ��� |J |<br />
log α<br />
log α<br />
dF0(t) = E H<br />
.<br />
log t<br />
log P0