Hypothesis testing in mixture regression models - Columbia University

4 H.-T. Zhu and H. Zhang 

in model (1) are very important in genetic studies for assessing potential gene–environment 

interactions. 

Beyond this example, finite mixtures of Bernoulli distributions such as model (1) have received 

much attention in the last five decades. See Teicher (1963) for an early example. More recently, 

Wang and Puterman (1998) among others generalized binomial finite mixtures to mixture 

logistic regression models, and Zhang et al. (2003) applied mixture cumulative logistic models 

to analyse correlated ordinal responses. 

1.2. Example 2: mixture of non-linear hierarchical models 

Longitudinal and genetic studies commonly involve a continuous response {Y ij }, also referred 

to as a quantitative trait. See, for example, Diggle et al. (2002), Haseman and Elston (1972) and 

Risch and Zhang (1995). Pauler and Laird (2000) used general finite mixture non-linear hierarchical 

models to analyse longitudinal data from heterogeneous subpopulations. Specifically, 

when there are only two subgroups, the model is of the form 

Y ij = g{x ij , β, U i z ij µ 1 + .1 − U i /z ij µ 2 } + " i,j , 

where the " i,j s are independent and identically distributed according to N.0, σ 2 / and g.·/ is 

a prespecified function. Here, the known covariates x ij may contain observed time points to 

reflect a time course in longitudinal data. 

1.3. Example 3: a finite mixture of Poisson regression models 

Poisson distribution and Poisson regression have been widely used to analyse count data 

(McCullagh and Nelder, 1989), but observed count data often exhibit overdispersion relative 

to this. Finite mixture Poisson regression models (Wang et al., 1996) provide a plausible explanation 

for overdispersion. Specifically, conditionally on all U i s, the Y ij s are independent 

and follow the Poisson regression model 

p.Y ij = y ij |x ij , U i / = 1 

y ij ! λy ij 

ij exp.−λ ij/, .2/ 

where λ ij = exp{x ij β + U i z ij µ 1 + .1 − U i /z ij µ 2 }. 

To summarize the models presented above, we consider a random sample of n independent 

observations {y i , X i } n 1 

with the density function 

p i .y i , x i ; ω/ = {.1 − α/f i .y i , x i ; β, µ 1 / + α f i .y i , x i ; β, µ 2 /} g i .x i /, .3/ 

where g i .x i / is the distribution function of X i . Further, ω = .α, β, µ 1 , µ 2 / is the unknown parameter 

vector, in which β (q 1 × 1) measures the strength of association that is contributed by 

the covariate terms and the two q 2 × 1 vectors, µ 1 and µ 2 , represent the different contributions 

from two different groups. 

Equivalently, if we consider P.U i = 0/ = 1 − P.U i = 1/ = α, and assume that the conditional 

density of y i given U i is p i .y i |U i / = f i {y i , x i ; β, µ 2 .1 − U i / + µ 1 U i }, then model (3) is the 

marginal density of y i . In fact, McCullagh and Nelder (1989) considered a special case in which 

f i is from an exponential family distribution, i.e. 

∏ 

f i {y i , x i ; β, µ 2 .1 − U i / + µ 1 U i } = n i 

exp[φ{y ij θ ij − a.θ ij /} + c.y ij , φ/], .4/ 

j=1 

where θ ij = h{x ij , β, U i µ 1 + .1 − U i /µ 2 }, h.·/ is a link function and φ is a dispersion parameter. 

This family of mixture regression models is very useful in practice.

Previous page

Next page

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Hypothesis testing in mixture regression models - Columbia University

Create successful ePaper yourself

Delete template?

Save as template?