03.07.2015 Views

Complete Identification Methods for the Causal Hierarchy - ClopiNet

Complete Identification Methods for the Causal Hierarchy - ClopiNet

Complete Identification Methods for the Causal Hierarchy - ClopiNet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Complete</strong> <strong>Identification</strong><br />

<strong>Methods</strong> <strong>for</strong> <strong>the</strong> <strong>Causal</strong><br />

<strong>Hierarchy</strong><br />

Ilya Shpitser<br />

(joint work with Judea Pearl)<br />

Computer Science Department<br />

UCLA<br />

ilyas@cs.ucla.edu


Problems We Address<br />

How to <strong>for</strong>malize causal questions people<br />

ask?<br />

What sorts of assumptions are necessary to<br />

answer causal questions people care about?<br />

What specific assumptions are needed to<br />

answer a given question?<br />

Is it possible to develop a general method <strong>for</strong><br />

answering causal questions? If so, can<br />

anything be proven about <strong>the</strong> generality of<br />

this method?


Formalized Notions<br />

<strong>Causal</strong> models (our domains)<br />

Interventions (changes in <strong>the</strong> domain)<br />

<strong>Causal</strong> queries:<br />

<strong>Causal</strong> effects (responses to changes)<br />

Counterfactuals (responses to multiple<br />

hypo<strong>the</strong>tical changes)<br />

Answering queries from premises (causal<br />

deduction or identification)


Graphical <strong>Causal</strong> Models<br />

Definition: A graphical causal model is a 4-tuple<br />

〈V,U,F,P(u)〉, where<br />

V = {V 1 ,...,V n } are observable variables<br />

U = {U 1 ,...,U m } are background variables<br />

F = {f 1 ,...,f n } are functions determining V in<br />

terms of a subset of U, V<br />

P(u) is a distribution over U<br />

P(u) and F induce a distribution P(v) over<br />

observable variables


Models Induce <strong>Causal</strong> Graphs<br />

For a model M draw a causal graph as follows:<br />

a node <strong>for</strong> each V i ,U i<br />

X → V i if X ∈ U ∪ V is an argument of f i<br />

Absence of ↔ arcs between disjoint<br />

W,Z ⊆ U implies P(W,Z) = P(W)P(Z)<br />

Optional: drop U nodes<br />

Genetic predisposition<br />

Genetic predisposition<br />

u1 u2 u1 u2 u3<br />

Smoking<br />

Cancer<br />

Smoking<br />

Tar<br />

Cancer


Models Induce <strong>Causal</strong> Graphs<br />

For a model M draw a causal graph as follows:<br />

a node <strong>for</strong> each V i ,U i<br />

X → V i if X ∈ U ∪ V is an argument of f i<br />

Absence of ↔ arcs between disjoint<br />

W,Z ⊆ U implies P(W,Z) = P(W)P(Z)<br />

Optional: drop U nodes<br />

Genetic predisposition<br />

Genetic predisposition<br />

Smoking<br />

Cancer<br />

Smoking<br />

Tar<br />

Cancer


Graphs and d-separation<br />

It’s possible to reflect probabilistic independence<br />

among variables using <strong>the</strong> graphical notion of<br />

d-separation:<br />

A path p is d-separated given Z if:<br />

p contains X → W → Y , X ← W → Y , or<br />

X ↔ W → Y , and W ∈ Z or<br />

p contains X → W ← Y , X ↔ W ← Y , or<br />

X ↔ W ↔ Y , and De(W) ∩ Z = ∅<br />

X is d-separated from Y by Z (X ⊥ Y |Z) if<br />

all paths from X to Y are d-separated by Z


<strong>Causal</strong> Queries<br />

We consider two kinds of causal queries<br />

<strong>Causal</strong> effects, e.g., “I have a headache.<br />

Should I take an aspirin?”<br />

Counterfactuals, e.g., “Would I have a<br />

headache had I taken an aspirin?”<br />

Crucial difference: causal effects are results<br />

of consistent evidence, while counterfactuals<br />

result from conflicting evidence


Interventions in <strong>Causal</strong> Models<br />

An action do(x) sets X to <strong>the</strong> value x regardless<br />

of <strong>the</strong> natural influences on X<br />

The causal effect of do(x) on P(v) is an<br />

interventional distribution P x (v) or P(v|do(x))<br />

do(x) removes all arrows (causal influences)<br />

incoming to X in model M to create a<br />

submodel M x<br />

Genetic predisposition<br />

Genetic predisposition<br />

Smoking = yes<br />

Cancer<br />

Smoking<br />

Tar = no<br />

Cancer


Counterfactual Events<br />

Event Y x = y means "variable Y attains value<br />

y under intervention do(x)" (abbreviated as y x )<br />

We view counterfactuals as conjunctions γ of<br />

such events: each event takes place in its<br />

own submodel<br />

Our example contains two events:<br />

“no aspirin” and “headache aspirin ”<br />

Can be represented as a distribution over<br />

<strong>the</strong>se events: P(headache aspirin |no aspirin)<br />

(well-defined and derivable from M)


Evaluating <strong>Causal</strong> Queries<br />

Two “direct” methods of evaluating queries:<br />

If we know P(u) and F , we can just evaluate<br />

We can act in our domain (or do a<br />

randomized experiment), and measure <strong>the</strong><br />

outcome Y to get P(y|do(x))<br />

Problems: P(u),F not generally known,<br />

experiments are expensive or illegal, unclear<br />

what to do with counterfactuals<br />

We want to compute queries from causal<br />

assumptions and available in<strong>for</strong>mation


<strong>Identification</strong><br />

We write φ,T ⊢ id θ if θ can be uniquely<br />

computed (identified) from φ in a class of<br />

models described by T<br />

φ,T ⊬ id θ if (∃M 1 ,M 2 ) ∈ T s.t. M 1 ,M 2 agree<br />

on φ but disagree on θ<br />

Intuition: φ are “premises,” T is <strong>the</strong> “domain<br />

<strong>the</strong>ory,” θ is <strong>the</strong> “query”<br />

Example: ma<strong>the</strong>matical logic<br />

φ = ZF, T = set <strong>the</strong>ory, θ = Axiom of Choice


<strong>Identification</strong> (cont.)<br />

Our domain <strong>the</strong>ory/description is <strong>the</strong> causal<br />

graph (T = G)<br />

θ is our queries (causal effects,<br />

counterfactuals)<br />

We consider two versions of <strong>the</strong> identification<br />

problem:<br />

P(v),G ⊢ id P(y|z,do(x)) (causal effects<br />

from observations)<br />

{P x (v \ x)|x ⊆ v},G ⊢ id P(γ|δ)<br />

(counterfactuals from experiments)


Our Contribution<br />

Both identification problems received some<br />

attention in <strong>the</strong> literature, but no complete<br />

solutions exist<br />

Our contribution is complete algorithms <strong>for</strong><br />

solving <strong>the</strong>se problems<br />

<strong>Complete</strong>: failure implies all o<strong>the</strong>r methods<br />

must fail<br />

<strong>Complete</strong>ness allows us to derive useful<br />

corollaries


Corollaries<br />

<strong>Complete</strong> graphical characterization of<br />

identifiable causal effects<br />

<strong>Complete</strong>ness status of existing identification<br />

algorithms (Tian’s algorithm, do-calculus)<br />

Applications (surrogate experiments,<br />

identifiable models)<br />

Longer term: G induces constraints on P(v).<br />

Which constraints are testable? Can we use<br />

testable constraints to infer parts of G from<br />

P(v)? (More on this in our AAAI-08 paper)


Effects: Known Graphical Criteria<br />

Back-door criterion:<br />

P(y|do(x)) = ∑ z<br />

P(y|x,z)P(z) in G if<br />

Z ∉ De(X) G<br />

Z blocks all paths from X to Y containing an<br />

arrow into X<br />

U<br />

Z<br />

X<br />

G<br />

Y


Known Graphical Criteria (cont.)<br />

Front-door criterion:<br />

P(y|do(x)) = ∑ z P(z|x)∑ x ′ P(y|x ′ ,z)P(x ′ ) in G if<br />

Z blocks all directed paths from X to Y<br />

No back-door from X to Z<br />

All back-doors from Z to Y blocked by X<br />

U<br />

X<br />

Z<br />

G<br />

Y


Negative Graphical Criteria<br />

Bow arc graph:<br />

U<br />

X<br />

Y


Negative Graphical Criteria<br />

Bow arc graph:<br />

P(U) = {0.5, 0.5},X ← U<br />

U<br />

X<br />

Y


Negative Graphical Criteria<br />

Bow arc graph:<br />

P(U) = {0.5, 0.5},X ← U<br />

M 1 : Y ← (X + U) (mod 2),M 2 : Y ← 0<br />

U<br />

U<br />

M<br />

M<br />

1 2<br />

X<br />

Y<br />

X<br />

Y


Negative Graphical Criteria<br />

Bow arc graph:<br />

P(U) = {0.5, 0.5},X ← U<br />

M 1 : Y ← (X + U) (mod 2),M 2 : Y ← 0<br />

Trick: bit parity is observationally <strong>the</strong> same as<br />

a constant 0 function in <strong>the</strong> bow arc graph<br />

U<br />

U<br />

M<br />

M<br />

1 2<br />

X<br />

Y<br />

X<br />

Y


Negative Graphical Criteria<br />

Bow arc graph:<br />

P(U) = {0.5, 0.5},X ← U<br />

M 1 : Y ← (X + U) (mod 2),M 2 : Y ← 0<br />

Trick: bit parity is observationally <strong>the</strong> same as<br />

a constant 0 function in <strong>the</strong> bow arc graph<br />

Conclusion: P(v),G ⊬ id P(y|do(x))<br />

U<br />

U<br />

M<br />

M<br />

1 2<br />

X<br />

Y<br />

X<br />

Y


Identifying Effects of Singletons<br />

Identifying P x (v \ x) (X a singleton):<br />

Theorem (Tian 2002): P(v),G ⊬ id P x (v \ x) iff<br />

<strong>the</strong>re is a child Z of X with a bidirected path<br />

from X to Z<br />

Question: What about P x (y) where X,Y are<br />

arbitrary sets? (Open since 1950s)<br />

X<br />

Z 1<br />

. . .<br />

Z Z k<br />

. . .


do-calculus<br />

do-calculus is a set of three rules <strong>for</strong><br />

manipulating interventional distributions:<br />

1 P x (y|z,w) = P x (y|w) if Y ⊥ Z|X,W in G X<br />

2 P x,z (y|w) = P x (y|z,w) if Y ⊥ Z|X,W in G X,Z<br />

3 P x,z (y|w) = P x (y|w) if Y ⊥ Z|X,W in G X,Z<br />

∗<br />

where Z ∗ = Z \ An(W) in G X<br />

Question: Can every identifiable effect be<br />

derived using do-calculus? (Open since 1994)


Our Results<br />

A complete graphical condition <strong>for</strong><br />

identification in <strong>the</strong> general case of P x (y)<br />

A complete algorithm expressing P x (y) in<br />

terms of P(v)<br />

A proof of completeness of do-calculus <strong>for</strong><br />

identifying P x (y)<br />

A characterization of models where all effects<br />

are identifiable


C-components<br />

C-component: maximal set of nodes pairwise<br />

connected by bidirected paths<br />

Given graph has two C-components:<br />

{W,Z,M} and {X,Y }<br />

W Z M<br />

X<br />

Y


General <strong>Identification</strong> Strategy<br />

Each C-component S i in G x corresponds to a<br />

subproblem<br />

Key step: to test P x (y) from P(v),G, test<br />

P v\si (s i ) <strong>for</strong> each s i , return ∑ ∏<br />

v\x,y i P v\s i<br />

(s i )<br />

For example: P x (y) = ∑ w,z P z(y)P x (z,w)<br />

W<br />

Z<br />

X<br />

Y


Example (cont.)<br />

So far: P x (y) = ∑ w,z P z(y)P x (z,w)<br />

P z (y) = ∑ w,x<br />

P(y|z,x,w)P(x,w) (back-door)<br />

P x (z,w) = P(z|x,w)P(w) (Tian)<br />

Combining: P x (y) =<br />

∑<br />

w,z,x ′ ,w ′ P(y|z,x ′ ,w ′ )P(x ′ ,w ′ )P(z|x,w)P(w)<br />

W<br />

Z<br />

X<br />

Y


<strong>Identification</strong> Failure<br />

Identifying P x (y) fails if Y is a C-component,<br />

Y ∪ X = V is a C-component, and some X is<br />

relevant to Y (e.g., is in An(Y ))<br />

Key <strong>the</strong>orem <strong>for</strong> completeness (aaai-06): if<br />

<strong>the</strong> above failure occurs in any subproblem,<br />

<strong>the</strong> overall effect is not identifiable<br />

Proof by explicit counterexample (as in <strong>the</strong><br />

bow arc case)<br />

Sanity check: <strong>the</strong> bow arc graph is a special<br />

case of this condition


Conditional Effects<br />

The likelihood of Y may be altered by<br />

observing Z = z after doing do(x)<br />

Such conditional effects are represented by<br />

distributions of <strong>the</strong> <strong>for</strong>m<br />

P x (y|z) = P x (y,z)/P x (z)<br />

In general, interventions and conditioning do<br />

not commute: P(y|z,do(x)) ≠ P(y|do(x),z)<br />

However, if Z ∉ De(X) G , <strong>the</strong>n equality holds


Identifying Conditional Effects<br />

Main result (uai-06): <strong>the</strong>re exists a unique<br />

maximum subset w ⊆ z, such that:<br />

We can discover w in polynomial time<br />

P x (y|z) = P x,w (y|z \ w)<br />

P(v),G ⊢ id P x (y|z) iff P(v),G ⊢ id P x,w (y,z \ w)<br />

This means identifiability of unconditional effects<br />

is all we need, and we get completeness <strong>for</strong> free!


Identifying Counterfactuals<br />

Counterfactuals involve multiple worlds<br />

Our “axioms” φ will be <strong>the</strong> set of all<br />

experiments we can per<strong>for</strong>m in a single world:<br />

P ∗ = {P x (v \ x)|x ⊆ v}<br />

We separate two sets of difficulties:<br />

Going from P(v) to P ∗ (already handled)<br />

Going from P ∗ to P(γ) or P(γ|δ)


Graphs and Counterfactuals<br />

We need a way to display causal<br />

assumptions across multiple worlds<br />

First attempt: twin network graph (Balke and<br />

Pearl)<br />

Problem: only two worlds!<br />

A<br />

A=false<br />

A*=true<br />

H H H*


Graphs and Counterfactuals (ct.)<br />

Adding more worlds yields <strong>the</strong> parallel worlds<br />

graph (ijcai-05)<br />

More problems: distinct nodes can be <strong>the</strong><br />

same variable<br />

Urx<br />

Rx<br />

Rx Rx’ Rx*<br />

A<br />

A=a<br />

Uh<br />

A’=a<br />

A*=a*<br />

H<br />

H<br />

H’ H*


Graphs and Counterfactuals (ct.)<br />

Our solution: Rid <strong>the</strong> parallel worlds graph of<br />

duplicate nodes inductively, starting at <strong>the</strong><br />

roots to obtain <strong>the</strong> counterfactual graph<br />

Urx<br />

Rx<br />

Rx Rx’ Rx*<br />

Rx<br />

A<br />

A=a<br />

Uh<br />

A’=a<br />

A*=a*<br />

A=a<br />

Uh<br />

A*=a*<br />

H<br />

H<br />

H’ H*<br />

H<br />

H*


<strong>Identification</strong> of Counterfactuals<br />

All identifiable P(γ) can be computed from P ∗<br />

in a way similar to effects (subproblems based<br />

on C-components in <strong>the</strong> counterfactual graph)<br />

Testing P(γ|δ) similar to testing P x (y|z)<br />

Testing P(γ) fails if γ <strong>for</strong>ms a C-component,<br />

all subscripts are in Pa(γ), and <strong>the</strong>re is a<br />

“conflict,” e.g., ∃y x ∈ γ and z x<br />

′ ∈ γ or x ′ z ∈ γ<br />

Key <strong>the</strong>orem (uai-07): failure in any<br />

subproblem implies overall counterfactual not<br />

identifiable


Example<br />

Effect of treatment on <strong>the</strong> treated: P(y x |x ′ )<br />

Not identifiable in <strong>the</strong> bow arc graph, but<br />

identifiable in <strong>the</strong> front-door graph<br />

P(y x |x ′ )= P(y x,x ′ )<br />

P(x ′ )<br />

=<br />

∑<br />

z P(y|z,x′ )P(x ′ )P(z|x)<br />

P(x ′ )<br />

∑<br />

z P z(y,x ′ )P x (z)<br />

P(x ′ )<br />

=<br />

= ∑ z P(y|z,x′ )P(z|x)<br />

U<br />

X=x’<br />

U<br />

X<br />

Z Y<br />

X=x Z Y


Conclusions<br />

Using <strong>the</strong> framework of graphical causal<br />

models, we posed two causal deduction<br />

problems: evaluating causal effects from<br />

observations, and evaluating counterfactuals<br />

from experiments<br />

We gave complete algorithms <strong>for</strong> solving<br />

<strong>the</strong>se problems, along with a characterization<br />

of models where a given problem can be<br />

solved

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!