11.01.2015 Views

Confounding, interaction, and mediation in multivariable/multivariate ...

Confounding, interaction, and mediation in multivariable/multivariate ...

Confounding, interaction, and mediation in multivariable/multivariate ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

<strong>Confound<strong>in</strong>g</strong>, <strong><strong>in</strong>teraction</strong>, <strong>and</strong> <strong>mediation</strong> <strong>in</strong><br />

<strong>multivariable</strong>/<strong>multivariate</strong> regression model<strong>in</strong>g<br />

William Wu<br />

Department of Biostatistics<br />

Cancer Biostatistics Center, V<strong>and</strong>erbilt-Ingram Cancer Center<br />

May 21, 2010<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Outl<strong>in</strong>e<br />

1 Mediation<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

2 <strong>Confound<strong>in</strong>g</strong><br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

3 Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

MEDIATION<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

P<strong>in</strong>gsheng’s study<br />

Study question:<br />

Whether <strong>and</strong> how childhood asthma was associated with maternal smok<strong>in</strong>g <strong>and</strong><br />

<strong>in</strong>fancy bronchiolitis.<br />

Prelim<strong>in</strong>ary model<strong>in</strong>g f<strong>in</strong>d<strong>in</strong>g:<br />

The significant association between maternal smok<strong>in</strong>g <strong>and</strong> asthma was found, but the<br />

association was gone after adjust<strong>in</strong>g for bronchiolitis <strong>in</strong> <strong>multivariable</strong> model<strong>in</strong>g.<br />

”We hypothesize that one mechanism through which maternal smok<strong>in</strong>g dur<strong>in</strong>g<br />

pregnancy contributes to the known <strong>in</strong>creased risk of develop<strong>in</strong>g childhood asthma is<br />

through <strong>in</strong>creas<strong>in</strong>g the risk of an important <strong>in</strong>termediate event, bronchiolitis dur<strong>in</strong>g<br />

<strong>in</strong>fancy.”<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Dan Weeks’ talk<br />

In the paper:<br />

ˆ ’Interpretation of Genetic Association Studies: Markers with Replicated Highly<br />

Significant Odds Ratios May Be Poor Classifiers.’<br />

ˆ ’Although a set of SNPs can be strongly associated with disease risk with<br />

extremely small P-values, the same set of SNPs may not necessarily have high<br />

discrim<strong>in</strong>ation ability or may not dramatically improve the discrim<strong>in</strong>ation ability<br />

of a classification model constructed us<strong>in</strong>g ’convent<strong>in</strong>al’ non-genetic risk factors<br />

without the SNPs.’<br />

PLoS genetics 2009;5(2)<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Adriana Gonzales’ study<br />

Study question:<br />

How biomarkers EGFR, AKT, <strong>and</strong> Ki-67 were correlated among 69 osteosarcoma<br />

patients.<br />

H Wu et al. Biomarker Insights 2007;2:469-76<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

EGFR pathway<br />

PTEN<br />

PI3-K<br />

AKT<br />

R R<br />

pY<br />

K K<br />

pY<br />

pY<br />

STAT<br />

RAS RAF<br />

SOS<br />

GRB2<br />

MEK<br />

MAPK<br />

Gene transcription<br />

Cell cycle progression<br />

PP<br />

myc<br />

cycl<strong>in</strong> D1<br />

DNA<br />

JunFos<br />

Myc<br />

Cycl<strong>in</strong> D1<br />

Proliferation/<br />

maturation<br />

Survival<br />

(anti-apoptosis)<br />

Angiogenesis<br />

Signal<strong>in</strong>g events are ordered both spatially <strong>and</strong> temporally<br />

Metastasis<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Possible approaches to Adriana’s data<br />

ˆ Correlation analysis of the 3 biomarkers<br />

ˆ Regression of Ki-67 on other 2 biomarkers<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

What is <strong>mediation</strong><br />

A <strong>mediation</strong> effect occurs when the third variable (mediator, M) carries the <strong>in</strong>fluence<br />

of a given <strong>in</strong>dependent variable (X) to a given dependent variable (Y).<br />

Mediation models expla<strong>in</strong> how an effect occurred by hypothesiz<strong>in</strong>g a causal sequence.<br />

Mediator<br />

a<br />

b<br />

X<br />

c<br />

Y<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Approaches to identification<br />

ˆ Approach 1: Causal steps<br />

ˆ Approach 2: Statistical test<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Approach 1: causual steps<br />

Model 1<br />

(1)<br />

X<br />

<br />

Y<br />

Y = 0(1) + X + (1) (1)<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Causual steps<br />

Model 2 <strong>and</strong> Model 3<br />

(3)<br />

<br />

M<br />

<br />

(2)<br />

X<br />

Y<br />

’<br />

Y = 0(2) + ’ X + M + (2) (2)<br />

M = 0(3) + X + (3) (3)<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

A significant <strong>mediation</strong> effect should be:<br />

ˆ τ <strong>in</strong> Model 1<br />

The total effect of the <strong>in</strong>dependent variable X on the dependent variable Y must<br />

be significant.<br />

ˆ α <strong>in</strong> Model 3<br />

The path from X to M must be significant.<br />

ˆ β <strong>in</strong> Model 2<br />

The path from M to Y must be significant.<br />

ˆ τ ′ <strong>in</strong> Model 2<br />

Evidence for <strong>mediation</strong> when τ ′ becomes <strong>in</strong>significant when the M is <strong>in</strong>cluded<br />

(effect of X on Y is zero). This would be complete <strong>mediation</strong><br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Approach 2: statistical test of <strong>mediation</strong><br />

Sobel test:<br />

to test the products√<br />

of coefficients of the two paths a <strong>and</strong> b.<br />

z − value = α ∗ β/ α 2 σβ 2 + β2 σα<br />

2<br />

The null hypothesis is a test of α ∗ β = 0.<br />

MacK<strong>in</strong>non <strong>and</strong> Dwyer (1994) <strong>and</strong> MacK<strong>in</strong>non, Warsi, <strong>and</strong> Dwyer (1995)<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Adriana’s study<br />

ˆ Initial variable EGFR <strong>and</strong> mediat<strong>in</strong>g variable Akt were<br />

immunosta<strong>in</strong><strong>in</strong>g <strong>in</strong>dex.<br />

ˆ Outcome variable Ki-67 was a cancer cell proliferation <strong>in</strong>dex<br />

<strong>and</strong> also an immunosta<strong>in</strong><strong>in</strong>g <strong>in</strong>dex.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Approach 1: the 3 models<br />

ˆ Model1 ← ols(Ki67 ∼ EGFR + age + sex)<br />

ˆ Model2 ← ols(Ki67 ∼ EGFR + AKT + age + sex)<br />

ˆ Model3 ← ols(AKT ∼ EGFR + age + sex)<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Jo<strong>in</strong>t model<strong>in</strong>g (SEM)<br />

0, 1<br />

e2<br />

Age<br />

AKT<br />

Gender<br />

EGFR<br />

0, 1<br />

e1<br />

Ki-67<br />

0, 1<br />

d1<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Approach 1: results<br />

Model<br />

Coefficient<br />

SE<br />

F<br />

p<br />

ols (Ki67~ EGFR+)<br />

: 0.0003069<br />

0.0001400<br />

4.80<br />

0.0340<br />

ols (AKT~ EGFR+)<br />

: 0.6584<br />

0.1996<br />

10.89<br />

0.0020<br />

ols (Ki67~ EGFR+AKT+)<br />

: 0.0003019<br />

0.0000993<br />

9.24<br />

0.0042<br />

ols (Ki67~ EGFR+AKT+)<br />

’: 0.00009687<br />

0.0001444<br />

0.45<br />

0.5061<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Components of <strong>mediation</strong> model<br />

ˆ Total effect=<br />

α ∗ β + τ ′ = 0.6584x0.0003019 + 0.00009687 = 0.0002957(∼ = τ = 0.0003069)<br />

ˆ Direct effect=<br />

ˆ Mediated effect=<br />

τ ′ = 0.00009687<br />

α ∗ β = 0.6584x0.0003019 = 0.0001988 =<br />

(total - direct = τ − τ ′ = 0.0002957 − 0.00009687 = 0.0001988)<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

3 examples<br />

Def<strong>in</strong>ition <strong>and</strong> identification<br />

An application<br />

Approach 2: results<br />

Test<br />

p value<br />

Sobel 0.00000152<br />

Goodman (I) 0.00000172<br />

Goodman (II) 0.00000133<br />

A significant mediat<strong>in</strong>g effect for Akt was found with the tests.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

CONFOUNDING<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

What is a confounder<br />

Criteria for a confounder<br />

1 It is a risk factor for the disease, <strong>in</strong>dependent of the putative<br />

risk factor (exposure variable or X).<br />

2 It is associated with putative risk factor (exposure).<br />

3 It is not <strong>in</strong> the causal pathway between exposure <strong>and</strong> disease.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

<strong>Confound<strong>in</strong>g</strong> model<br />

The association between X (exposure) <strong>and</strong> Y (outcome) is<br />

distorted by the presence of another variable C (confounder)<br />

α<br />

X<br />

C<br />

β<br />

Y<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

An example<br />

ˆ Age may confound the positive relationship between annual<br />

<strong>in</strong>come <strong>and</strong> cancer <strong>in</strong>cidence <strong>in</strong> the US.<br />

1 Older <strong>in</strong>dividuals are also more likely to get cancer.<br />

2 Older <strong>in</strong>dividuals are likely to earn more money than younger<br />

ones who have not spent as much time <strong>in</strong> the work force.<br />

3 Income does not cause age, which then causes cancer.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

<strong>Confound<strong>in</strong>g</strong><br />

ˆ Not an issue <strong>in</strong> r<strong>and</strong>omized study<br />

R<strong>and</strong>omization process will elim<strong>in</strong>ate the correlation between confounder <strong>and</strong><br />

exposure, e.g., age <strong>and</strong> annual <strong>in</strong>come (i.e. we should have roughly equal<br />

numbers of age category <strong>in</strong> each annual <strong>in</strong>come group).<br />

ˆ An issue <strong>in</strong> observational study.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

Consequence of confound<strong>in</strong>g<br />

ˆ Bias effect estimate.<br />

ˆ Widen confidence <strong>in</strong>terval.<br />

ˆ Inclusion of additional potential confounders may only widen<br />

the CI <strong>and</strong> have no impact of effect estimate.<br />

ˆ <strong>Confound<strong>in</strong>g</strong> is the mask<strong>in</strong>g of the true effect of a risk factor<br />

on a disease or outcome by the presence of another variable.<br />

ˆ The presence of confound<strong>in</strong>g effect leads to a spurious<br />

association between exposure <strong>and</strong> outcome.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

How to pick potential confounders<br />

philosophically<br />

ˆ Not a simple question<br />

ˆ Your knowledge<br />

ˆ Prior experience with data<br />

ˆ The three criteria for confounders<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

How to pick potential confounders<br />

statistically<br />

ˆ When you get to do<strong>in</strong>g <strong>multivariable</strong> logistic regression, for example, one rule of<br />

thumb is that if the odds ratio changes by 10% or more then this is reason to<br />

<strong>in</strong>clude the potential confounder <strong>in</strong> your multi-variable model. We don’t tend to<br />

look at just whether it is statistically significant, but <strong>in</strong>stead, how much does it<br />

change with this effect. This change is what we want to measure. If it changes<br />

the effect by 10% or more, then we consider it a confounder <strong>and</strong> leave it <strong>in</strong> the<br />

model.<br />

ˆ P values will not tell confound<strong>in</strong>g effect. Rather, only, change <strong>in</strong> β between with<br />

adjustment <strong>and</strong> w/o adjustment can tell if the confound<strong>in</strong>g is work<strong>in</strong>g.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

Design stage<br />

1 R<strong>and</strong>omization<br />

2 Restriction (of the study population to a category of a<br />

confounder, but will limit generalizability)<br />

3 Match<strong>in</strong>g (Not feasible match<strong>in</strong>g all, residual confound<strong>in</strong>g will<br />

still bias estimate)<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

Data analysis stage<br />

1 Adjustment (a few to dozen)<br />

2 propensity score (all)<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

What is Unobserved confound<strong>in</strong>g<br />

<strong>Confound<strong>in</strong>g</strong> that rema<strong>in</strong>s after adjustment for observed<br />

confounders are referred as residual confound<strong>in</strong>g. This <strong>in</strong>volves<br />

unmeasured confound<strong>in</strong>g as well as <strong>in</strong>accurately measured<br />

confound<strong>in</strong>g.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

Methods for unobserved confound<strong>in</strong>g<br />

ˆ A challenge<br />

can not adjust for<br />

only r<strong>and</strong>omization<br />

ˆ But you can<br />

adjust for as many as allowed<br />

improve the measurement (Measurement error <strong>in</strong> confounders<br />

will lead to residual confound<strong>in</strong>g).<br />

use more appropriate scal<strong>in</strong>g of the measurement<br />

more ...<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

Other proposed approaches for unobserved confound<strong>in</strong>g<br />

BIOMETRICS 56, 915-921 September 2000<br />

When Should Epidemiologic Regressions Use R<strong>and</strong>om Coefficients<br />

S<strong>and</strong>er Greenl<strong>and</strong><br />

Department of Epidemiology, UCLA School of Public Health,<br />

Los Angeles, California 90095-1772, U.S.A.<br />

SUMMARY. Regression models with r<strong>and</strong>om coefficients arise naturally <strong>in</strong> both frequentist <strong>and</strong> Bayesian approaches<br />

to estimation problems. They are becom<strong>in</strong>g widely available <strong>in</strong> st<strong>and</strong>ard computer packages under the head<strong>in</strong>gs of<br />

generalized l<strong>in</strong>ear mixed models, hierarchical models, <strong>and</strong> multilevel models. I here argue that such models offer a<br />

more scientifically defensible framework for epidemiologic analysis than the fixed-effects models now prevalent <strong>in</strong><br />

epidemiology. The argument <strong>in</strong>vokes an antiparsimony pr<strong>in</strong>ciple attributed to L. J. Savage, which is that models should<br />

be rich enough to reflect the complexity of the relations under study. It also <strong>in</strong>vokes the countervail<strong>in</strong>g pr<strong>in</strong>ciple that<br />

you cannot estimate anyth<strong>in</strong>g if you try to estimate everyth<strong>in</strong>g (often used to justify parsimony). Regression with<br />

r<strong>and</strong>om coefficients offers a rational compromise between these pr<strong>in</strong>ciples as well as an alternative to analyses based<br />

on st<strong>and</strong>ard variable-selection algorithms <strong>and</strong> their attendant distortion of uncerta<strong>in</strong>ty assessments. These po<strong>in</strong>ts are<br />

illustrated with an analysis of data on diet, nutrition, <strong>and</strong> breast cancer.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

<strong>Confound<strong>in</strong>g</strong> is different from <strong>mediation</strong> <strong>in</strong>:<br />

ˆ Temorality (Exposure occurs first <strong>and</strong> then M <strong>and</strong> outcome,<br />

<strong>and</strong> conceptually follows an experimental design)<br />

ˆ Directionality<br />

ˆ Causality<br />

ˆ Confounders often demographic variables that typically cannot<br />

be changed <strong>in</strong> an experimental design. Mediators are by<br />

def<strong>in</strong>ition cable of be<strong>in</strong>g changed <strong>and</strong> are often selected based<br />

on malleability.<br />

ˆ statistical test<br />

logo<br />

William Wu<br />

Cancer Biostatistics


¢ ¤ ¡ ¦ ¨ ¦ ¤ ¤ ¤ ¤ ¦ ¤ ¨<br />

¡<br />

! " ¡ ¢ $ ¢ ¤ ¡ % ¤ ¡ ¦ ¨ ¦ "<br />

<br />

. 0 2 4 6 2 4 8 : -<br />

< > @ B C B F G H J K C G O K R S U < X S C < X J<br />

;<br />

< O \ ] C G X H a B G @ a G f G \ S X C O G @ C<br />

;<br />

@ B F G X K B C J < l ; S n B l < X @ B S q R < K t @ > G n G K q ; t<br />

i<br />

’ œ ” ª £ ¥ ° ¥ ¥ ° £ œ œ ‘ ¤ ˜ ¥ ˜ ‘ ’ ˜ œ ’ ‘ ¥ ¡ ª ‘ — ” ¤ £ — ³ » “ ¤ ¥ ‘ ª œ ‘ ¥ ° £ ª ¥ ° ’ ¥ ° £ £ ± £ ¤ ¥ ” ’ — £ ª œ ¥ ” — » ¦ É ° £<br />

£<br />

ª £ œ £ ’ ¤ £ ‘ “ œ ¡ ” ª ˜ ‘ ” œ œ œ ‘ ¤ ˜ ¥ ˜ ‘ ’ Ä — ” £ “ ‘ ª £ ½ Ÿ ¡ ¢ £ ¥ ‘ ¥ ° £ ˜ ’ Î ” £ ’ ¤ £ ‘ “ £ ½ ¥ ª ’ £ ‘ ” œ ² ª ˜ ³ ¢ £ œ Ä ˜ œ<br />

¡<br />

¢ ¢ £ — Ï Ð Ñ Ò Ð Ó Ñ Ô Õ Ñ Ö œ ˜ ¥ ¥ £ ’ — œ ¥ ‘ ¤ ‘ ’ “ ‘ ” ’ — ‘ ” ª ª £ — ˜ ’ š ’ — ¥ ‘ ³ ˜ œ ‘ ” ª £ œ ¥ ˜ Ÿ ¥ £ ‘ “ ¥ ° £ £ ± £ ¤ ¥<br />

¤<br />

¥ ” — ˜ £ — ¦ ‘ ’ ¤ £ ¡ ¥ ” ¢ ¢ » Ä ¥ ° £ ª £ “ ‘ ª £ Ä © £ ¤ ’ œ » ¥ ° ¥ µ ’ — ¹ ª £ ¤ ‘ ’ “ ‘ ” ’ — £ — © ° £ ’ ¥ ° £ ª £ ˜ œ<br />

œ<br />

° ˜ ª — ² ª ˜ ³ ¢ £ Û ¥ ° ¥ ˜ ’ Î ” £ ’ ¤ £ œ ³ ‘ ¥ ° µ ’ — ¹ Ü œ ” ¤ ° ² ª ˜ ³ ¢ £ ˜ œ ¥ ° £ ’ ¤ ¢ ¢ £ — Ý ¤ ‘ ’ “ ‘ ” ’ — £ ª Þ<br />

¥<br />

“ µ ’ — ¹ ¦ ‘<br />

œ œ ˜ Ÿ ¡ ¢ £ œ ¥ ° ˜ œ ¤ ‘ ’ ¤ £ ¡ ¥ ˜ œ Ä ˜ ¥ ° œ ª £ œ ˜ œ ¥ £ — “ ‘ ª Ÿ ¢ ¥ ª £ ¥ Ÿ £ ’ ¥ “ ‘ ª œ £ ² £ ª ¢ — £ ¤ — £ œ Ä ’ —<br />

ß<br />

‘ ª š ‘ ‘ — ª £ œ ‘ ’ â É ° £ ² £ ª » ’ ‘ ¥ ˜ ‘ ’ œ ‘ “ Ý £ ± £ ¤ ¥ Þ ’ — Ý ˜ ’ Î ” £ ’ ¤ £ Þ Ä ª £ ¢ ¥ ˜ ² £ ¥ ‘ © ° ˜ ¤ ° Ý œ ¡ ” ä<br />

“<br />

˜ ‘ ” œ œ œ ‘ ¤ ˜ ¥ ˜ ‘ ’ Þ ˜ œ ¥ ‘ ³ £ — £ æ ’ £ — Ä ° œ ª £ œ ˜ œ ¥ £ — Ÿ ¥ ° £ Ÿ ¥ ˜ ¤ ¢ “ ‘ ª Ÿ ” ¢ ¥ ˜ ‘ ’ ¦ É ° £ £ Ÿ ¡ ˜ ª ˜ ¤ ¢<br />

ª<br />

£ æ ’ ˜ ¥ ˜ ‘ ’ ‘ “ £ ± £ ¤ ¥ œ ’ œ œ ‘ ¤ ˜ ¥ ˜ ‘ ’ ¥ ° ¥ © ‘ ” ¢ — ¡ ª £ ² ˜ ¢ ˜ ’ ¤ ‘ ’ ¥ ª ‘ ¢ ¢ £ — ª ’ — ‘ Ÿ ˜ è £ — £ ½ ¡ £ ª ä<br />

—<br />

£ ª £ ¥ ‘ ¤ ° ’ š £ Ä œ » “ ª ‘ Ÿ ‘ ³ œ £ ª ² ¥ ˜ ‘ ’ ¢ ¥ ‘ ¤ ‘ ’ ¥ ª ‘ ¢ ¢ £ — œ ¥ ” — ˜ £ œ ¦ ë ” ¤ ° ¡ ª £ — ˜ ¤ ¥ ˜ ‘ ’ œ ª £ ì ” ˜ ª £ £ ½ ¥ ª<br />

©<br />

’ “ ‘ ª Ÿ ¥ ˜ ‘ ’ Ä ˜ ’ ¥ ° £ “ ‘ ª Ÿ ‘ “ ¤ ” œ ¢ ‘ ª ¤ ‘ ” ’ ¥ £ ª “ ¤ ¥ ” ¢ œ œ ” Ÿ ¡ ¥ ˜ ‘ ’ œ î ï ª £ £ ’ ¢ ’ — ’ — ñ ‘ ³ ˜ ’ œ<br />

˜<br />

ó ô õ Ü ö ˜ ¤ ® ª Ÿ ª ¥ ’ £ ’ — ÷ ‘ ¢ “ ‘ ª — ò ó ô ø ù Ä © ° ˜ ¤ ° ˜ œ ’ ‘ ¥ — ˜ œ ¤ £ ª ’ ˜ ³ ¢ £ “ ª ‘ Ÿ — £ ’ œ ˜ ¥ » “ ” ’ ¤ ¥ ˜ ‘ ’ œ ¦<br />

ò<br />

° £ œ £ — ˜ û ¤ ” ¢ ¥ ˜ £ œ ’ ‘ ¥ © ˜ ¥ ° œ ¥ ’ — ˜ ’ š Ä £ ¡ ˜ — £ Ÿ ˜ ‘ ¢ ‘ š ˜ œ ¥ œ Ä ³ ˜ ‘ œ ¥ ¥ ˜ œ ¥ ˜ ¤ ˜ ’ œ Ä œ ‘ ¤ ˜ ¢ œ ¤ ˜ £ ’ ¥ ˜ œ ¥ œ ’ —<br />

É<br />

Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition <strong>and</strong> determ<strong>in</strong>ation<br />

Methods for confound<strong>in</strong>g effect<br />

Unobserved confound<strong>in</strong>g<br />

Difference from <strong>mediation</strong><br />

statistical test for confound<strong>in</strong>g<br />

TECHNICAL REPORT<br />

R-256<br />

January 1998<br />

¡ ¢ ¤ ¡ ¦ ¢ ¨ ¦ $ ¤ ¨ ! ¡ ¤ ,<br />

v w w x y<br />

z { | } ~ € ‚ { € ƒ ~ ‚ } | {<br />

„ … † ‡ ˆ Š ‹ Œ ‡ … Š †<br />

’ “ ‘ ” ’ — ˜ ’ š ˜ œ œ ˜ Ÿ ¡ ¢ £ ¤ ‘ ’ ¤ £ ¡ ¥ ¦ § “ © £ ” ’ — £ ª ¥ ® £ ¥ ‘ £ œ ¥ ˜ Ÿ ¥ £ ¥ ° £ £ ± £ ¤ ¥ ‘ “ ‘ ’ £ ² ª ˜ ³ ¢ £<br />

‘<br />

µ · ‘ ’ ’ ‘ ¥ ° £ ª ´ ¹ · ³ » £ ½ Ÿ ˜ ’ ˜ ’ š ¥ ° £ œ ¥ ¥ ˜ œ ¥ ˜ ¤ ¢ œ œ ‘ ¤ ˜ ¥ ˜ ‘ ’ ³ £ ¥ © £ £ ’ ¥ ° £ ¥ © ‘ Ä © £ ‘ ” š ° ¥ ¥ ‘<br />

´<br />

Ÿ £ ’ ¥ Ä ¤ ’ ’ ‘ ¥ £ œ ˜ ¢ » ³ £ £ ½ ¡ ª £ œ œ £ — ˜ ’ ¥ ° £ œ ¥ ’ — ª — ¢ ’ š ” š £ ‘ “ ¡ ª ‘ ³ ³ ˜ ¢ ˜ ¥ » ¥ ° £ ‘ ª » Ä ³ £ ¤ ” œ £<br />

˜<br />

° ¥ ¥ ° £ ‘ ª » — £ ¢ œ © ˜ ¥ ° œ ¥ ¥ ˜ ¤ ¤ ‘ ’ — ˜ ¥ ˜ ‘ ’ œ Ä ’ — — ‘ £ œ ’ ‘ ¥ ¡ £ ª Ÿ ˜ ¥ ” œ ¥ ‘ ¡ ª £ — ˜ ¤ ¥ Ä £ ² £ ’ “ ª ‘ Ÿ “ ” ¢ ¢<br />

¥<br />

logo<br />

œ ¡ £ ¤ ˜ æ ¤ ¥ ˜ ‘ ’ ‘ “<br />

¡ ‘ ¡ ” ¢ ¥ ˜ ‘ ’ — £ ’ œ ˜ ¥ » “ ” ’ ¤ ¥ ˜ ‘ ’ Ä © ° ¥ ª £ ¢ ¥ ˜ ‘ ’ œ ° ˜ ¡ œ © ‘ ” ¢ — ¡ ª £ ² ˜ ¢ ˜ “ ¤ ‘ ’ — ˜ ¥ ˜ ‘ ’ œ<br />

¤ ‘ ’ ‘ Ÿ ˜ œ ¥ œ ü ° ² £ Ÿ — £ ’ ” Ÿ £ ª ‘ ” œ ¥ ¥ £ Ÿ ¡ ¥ œ ¥ ‘ £ ½ ¡ ª £ œ œ ¤ ‘ ’ “ ‘ ” ’ — ˜ ’ š ˜ ’ œ ¥ ¥ ˜ œ ¥ ˜ ¤ ¢ ¥ £ ª Ÿ œ Ä ¡ ª ¥ ¢ »<br />

£<br />

William Wu Cancer Biostatistics<br />

ý þ ¢ ¤ ¦ ¨ ¢ ! " # % % & ( ) - ¢ " 1 ¢ 5 5 ¦ 7 8 ¢ 5 (<br />

= > ¦ ¢ ¦ 5 ¨ ¢ 7 ¤ C 5 ¨ 7 5 ¨ F C " F ¢ F 7 ¦ I F ¨ K ¢ ! ¨ 7 ¨ ¦ F M ¦ ¨ ¦ I N P I ¦ ¨ " S T V ¦ I


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

INTERACTION<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

What is <strong><strong>in</strong>teraction</strong><br />

ˆ An <strong><strong>in</strong>teraction</strong> means that the effect of X on Y depends on<br />

the level of a third variable.<br />

ˆ No causal sequence is implied by <strong><strong>in</strong>teraction</strong>.<br />

ˆ Also known as modification or moderation<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

Underst<strong>and</strong><strong>in</strong>g <strong><strong>in</strong>teraction</strong> effect<br />

ˆ We have the follow<strong>in</strong>g regression on x 1 <strong>and</strong> x 2 :<br />

y = α + β 1 x 1 + β 2 x 2 + β 3 (x 1 ∗ x 2 ) + ε<br />

ˆ The null hypothesis is H 0 : β 3 = 0, or product of the two<br />

variables, x 1 <strong>and</strong> x 2 , has no effect on Y.<br />

ˆ The test of H 0 : β 3 = 0 is a test for parallellism of the two<br />

slopes (if x 2 has two levels).<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

Underst<strong>and</strong><strong>in</strong>g <strong><strong>in</strong>teraction</strong> effect<br />

ˆ Given:<br />

y = α + β 1 x 1 + β 2 x 2 + β 3 (x 1 ∗ x 2 ) + ε<br />

ˆ Without <strong><strong>in</strong>teraction</strong>, effect x 1 on y is measured by β 1 .<br />

ˆ With <strong><strong>in</strong>teraction</strong> term, effect of x 1 on y is measured by<br />

β 1 + β 3 x 2 .<br />

effect changes as x 2 <strong>in</strong>creases.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

When x 2 has two levels<br />

Effect (slope) of X1 on Y does depend on X2 value.<br />

12<br />

8<br />

Y<br />

Y = 1 + 2X2<br />

1 + 3X3<br />

2 + 4X4<br />

1 X 2<br />

Y = 1 + 2X2<br />

+ 3(1) 4<br />

1 ) + 4X 1 (1)) = 4 + 6X 1<br />

Y = 1 + 2X2<br />

+ 3(0) 4<br />

1 ) + 4X 1 (0)) = 1 + 2X 1<br />

4<br />

0<br />

0 0.5 1 1.5<br />

X 1<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

How to determ<strong>in</strong>e <strong><strong>in</strong>teraction</strong><br />

philosophically<br />

ˆ Interaction can be an <strong>in</strong>terest of the study<br />

ˆ Interaction is usually pre-specified<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

An example about the philosophy<br />

For example, imag<strong>in</strong>e a study that tests the effects of a treatment<br />

of an outcome measure. The treatment variable is composed of<br />

two groups, e.g., treatment <strong>and</strong> control. The results are that the<br />

mean of the treatment group is higher than the mean for the<br />

control group. But what if the research is also <strong>in</strong>terested <strong>in</strong><br />

whether the treatment is equally effective for females <strong>and</strong> males.<br />

That is a difference <strong>in</strong> treatment depend<strong>in</strong>g on gender group. This<br />

is a question of <strong><strong>in</strong>teraction</strong>.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

How to determ<strong>in</strong>e <strong><strong>in</strong>teraction</strong><br />

statistically<br />

ˆ Likelihood ratio test can be applied to test the <strong><strong>in</strong>teraction</strong>.<br />

ˆ Interaction terms can be excluded from the model if they are<br />

as a whole <strong>in</strong>significant.<br />

ˆ Ma<strong>in</strong> effect may turn to be <strong>in</strong>significant when <strong><strong>in</strong>teraction</strong> is<br />

<strong>in</strong>cluded <strong>in</strong> model.<br />

ˆ Ma<strong>in</strong> effect won’t tell the whole story <strong>in</strong> the presence of<br />

significant <strong><strong>in</strong>teraction</strong>.<br />

ˆ Stratified estimates are to be reported if the <strong><strong>in</strong>teraction</strong> is<br />

tested significant.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

A note<br />

The statistical power to test the significant <strong><strong>in</strong>teraction</strong> is 5-times<br />

lower that to test ma<strong>in</strong> effect. So, p = 0.10 could be considered<br />

significant. Keep <strong>in</strong> m<strong>in</strong>d we do not want to miss any important<br />

<strong><strong>in</strong>teraction</strong>.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

Interaction is different from <strong>mediation</strong> <strong>in</strong>:<br />

ˆ No causality<br />

ˆ Test for product of two measurements (test for product of two<br />

coefficients for <strong>mediation</strong>)<br />

ˆ Can be tested (confound<strong>in</strong>g can not)<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

Acknowledgements<br />

P<strong>in</strong>gsheng Wu, Ph.D.<br />

Adriana Gonzalez, M.D., Ph.D.<br />

Debra Friedman, M.D., Ph.D.<br />

logo<br />

William Wu<br />

Cancer Biostatistics


Mediation<br />

<strong>Confound<strong>in</strong>g</strong><br />

Interaction<br />

Def<strong>in</strong>ition<br />

Determ<strong>in</strong>ation of <strong><strong>in</strong>teraction</strong><br />

Difference from <strong>mediation</strong> <strong>and</strong> confound<strong>in</strong>g<br />

Thank you!<br />

logo<br />

William Wu<br />

Cancer Biostatistics

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!