Complete Identification Methods for the Causal Hierarchy - ClopiNet
Complete Identification Methods for the Causal Hierarchy - ClopiNet
Complete Identification Methods for the Causal Hierarchy - ClopiNet
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Complete</strong> <strong>Identification</strong><br />
<strong>Methods</strong> <strong>for</strong> <strong>the</strong> <strong>Causal</strong><br />
<strong>Hierarchy</strong><br />
Ilya Shpitser<br />
(joint work with Judea Pearl)<br />
Computer Science Department<br />
UCLA<br />
ilyas@cs.ucla.edu
Problems We Address<br />
How to <strong>for</strong>malize causal questions people<br />
ask?<br />
What sorts of assumptions are necessary to<br />
answer causal questions people care about?<br />
What specific assumptions are needed to<br />
answer a given question?<br />
Is it possible to develop a general method <strong>for</strong><br />
answering causal questions? If so, can<br />
anything be proven about <strong>the</strong> generality of<br />
this method?
Formalized Notions<br />
<strong>Causal</strong> models (our domains)<br />
Interventions (changes in <strong>the</strong> domain)<br />
<strong>Causal</strong> queries:<br />
<strong>Causal</strong> effects (responses to changes)<br />
Counterfactuals (responses to multiple<br />
hypo<strong>the</strong>tical changes)<br />
Answering queries from premises (causal<br />
deduction or identification)
Graphical <strong>Causal</strong> Models<br />
Definition: A graphical causal model is a 4-tuple<br />
〈V,U,F,P(u)〉, where<br />
V = {V 1 ,...,V n } are observable variables<br />
U = {U 1 ,...,U m } are background variables<br />
F = {f 1 ,...,f n } are functions determining V in<br />
terms of a subset of U, V<br />
P(u) is a distribution over U<br />
P(u) and F induce a distribution P(v) over<br />
observable variables
Models Induce <strong>Causal</strong> Graphs<br />
For a model M draw a causal graph as follows:<br />
a node <strong>for</strong> each V i ,U i<br />
X → V i if X ∈ U ∪ V is an argument of f i<br />
Absence of ↔ arcs between disjoint<br />
W,Z ⊆ U implies P(W,Z) = P(W)P(Z)<br />
Optional: drop U nodes<br />
Genetic predisposition<br />
Genetic predisposition<br />
u1 u2 u1 u2 u3<br />
Smoking<br />
Cancer<br />
Smoking<br />
Tar<br />
Cancer
Models Induce <strong>Causal</strong> Graphs<br />
For a model M draw a causal graph as follows:<br />
a node <strong>for</strong> each V i ,U i<br />
X → V i if X ∈ U ∪ V is an argument of f i<br />
Absence of ↔ arcs between disjoint<br />
W,Z ⊆ U implies P(W,Z) = P(W)P(Z)<br />
Optional: drop U nodes<br />
Genetic predisposition<br />
Genetic predisposition<br />
Smoking<br />
Cancer<br />
Smoking<br />
Tar<br />
Cancer
Graphs and d-separation<br />
It’s possible to reflect probabilistic independence<br />
among variables using <strong>the</strong> graphical notion of<br />
d-separation:<br />
A path p is d-separated given Z if:<br />
p contains X → W → Y , X ← W → Y , or<br />
X ↔ W → Y , and W ∈ Z or<br />
p contains X → W ← Y , X ↔ W ← Y , or<br />
X ↔ W ↔ Y , and De(W) ∩ Z = ∅<br />
X is d-separated from Y by Z (X ⊥ Y |Z) if<br />
all paths from X to Y are d-separated by Z
<strong>Causal</strong> Queries<br />
We consider two kinds of causal queries<br />
<strong>Causal</strong> effects, e.g., “I have a headache.<br />
Should I take an aspirin?”<br />
Counterfactuals, e.g., “Would I have a<br />
headache had I taken an aspirin?”<br />
Crucial difference: causal effects are results<br />
of consistent evidence, while counterfactuals<br />
result from conflicting evidence
Interventions in <strong>Causal</strong> Models<br />
An action do(x) sets X to <strong>the</strong> value x regardless<br />
of <strong>the</strong> natural influences on X<br />
The causal effect of do(x) on P(v) is an<br />
interventional distribution P x (v) or P(v|do(x))<br />
do(x) removes all arrows (causal influences)<br />
incoming to X in model M to create a<br />
submodel M x<br />
Genetic predisposition<br />
Genetic predisposition<br />
Smoking = yes<br />
Cancer<br />
Smoking<br />
Tar = no<br />
Cancer
Counterfactual Events<br />
Event Y x = y means "variable Y attains value<br />
y under intervention do(x)" (abbreviated as y x )<br />
We view counterfactuals as conjunctions γ of<br />
such events: each event takes place in its<br />
own submodel<br />
Our example contains two events:<br />
“no aspirin” and “headache aspirin ”<br />
Can be represented as a distribution over<br />
<strong>the</strong>se events: P(headache aspirin |no aspirin)<br />
(well-defined and derivable from M)
Evaluating <strong>Causal</strong> Queries<br />
Two “direct” methods of evaluating queries:<br />
If we know P(u) and F , we can just evaluate<br />
We can act in our domain (or do a<br />
randomized experiment), and measure <strong>the</strong><br />
outcome Y to get P(y|do(x))<br />
Problems: P(u),F not generally known,<br />
experiments are expensive or illegal, unclear<br />
what to do with counterfactuals<br />
We want to compute queries from causal<br />
assumptions and available in<strong>for</strong>mation
<strong>Identification</strong><br />
We write φ,T ⊢ id θ if θ can be uniquely<br />
computed (identified) from φ in a class of<br />
models described by T<br />
φ,T ⊬ id θ if (∃M 1 ,M 2 ) ∈ T s.t. M 1 ,M 2 agree<br />
on φ but disagree on θ<br />
Intuition: φ are “premises,” T is <strong>the</strong> “domain<br />
<strong>the</strong>ory,” θ is <strong>the</strong> “query”<br />
Example: ma<strong>the</strong>matical logic<br />
φ = ZF, T = set <strong>the</strong>ory, θ = Axiom of Choice
<strong>Identification</strong> (cont.)<br />
Our domain <strong>the</strong>ory/description is <strong>the</strong> causal<br />
graph (T = G)<br />
θ is our queries (causal effects,<br />
counterfactuals)<br />
We consider two versions of <strong>the</strong> identification<br />
problem:<br />
P(v),G ⊢ id P(y|z,do(x)) (causal effects<br />
from observations)<br />
{P x (v \ x)|x ⊆ v},G ⊢ id P(γ|δ)<br />
(counterfactuals from experiments)
Our Contribution<br />
Both identification problems received some<br />
attention in <strong>the</strong> literature, but no complete<br />
solutions exist<br />
Our contribution is complete algorithms <strong>for</strong><br />
solving <strong>the</strong>se problems<br />
<strong>Complete</strong>: failure implies all o<strong>the</strong>r methods<br />
must fail<br />
<strong>Complete</strong>ness allows us to derive useful<br />
corollaries
Corollaries<br />
<strong>Complete</strong> graphical characterization of<br />
identifiable causal effects<br />
<strong>Complete</strong>ness status of existing identification<br />
algorithms (Tian’s algorithm, do-calculus)<br />
Applications (surrogate experiments,<br />
identifiable models)<br />
Longer term: G induces constraints on P(v).<br />
Which constraints are testable? Can we use<br />
testable constraints to infer parts of G from<br />
P(v)? (More on this in our AAAI-08 paper)
Effects: Known Graphical Criteria<br />
Back-door criterion:<br />
P(y|do(x)) = ∑ z<br />
P(y|x,z)P(z) in G if<br />
Z ∉ De(X) G<br />
Z blocks all paths from X to Y containing an<br />
arrow into X<br />
U<br />
Z<br />
X<br />
G<br />
Y
Known Graphical Criteria (cont.)<br />
Front-door criterion:<br />
P(y|do(x)) = ∑ z P(z|x)∑ x ′ P(y|x ′ ,z)P(x ′ ) in G if<br />
Z blocks all directed paths from X to Y<br />
No back-door from X to Z<br />
All back-doors from Z to Y blocked by X<br />
U<br />
X<br />
Z<br />
G<br />
Y
Negative Graphical Criteria<br />
Bow arc graph:<br />
U<br />
X<br />
Y
Negative Graphical Criteria<br />
Bow arc graph:<br />
P(U) = {0.5, 0.5},X ← U<br />
U<br />
X<br />
Y
Negative Graphical Criteria<br />
Bow arc graph:<br />
P(U) = {0.5, 0.5},X ← U<br />
M 1 : Y ← (X + U) (mod 2),M 2 : Y ← 0<br />
U<br />
U<br />
M<br />
M<br />
1 2<br />
X<br />
Y<br />
X<br />
Y
Negative Graphical Criteria<br />
Bow arc graph:<br />
P(U) = {0.5, 0.5},X ← U<br />
M 1 : Y ← (X + U) (mod 2),M 2 : Y ← 0<br />
Trick: bit parity is observationally <strong>the</strong> same as<br />
a constant 0 function in <strong>the</strong> bow arc graph<br />
U<br />
U<br />
M<br />
M<br />
1 2<br />
X<br />
Y<br />
X<br />
Y
Negative Graphical Criteria<br />
Bow arc graph:<br />
P(U) = {0.5, 0.5},X ← U<br />
M 1 : Y ← (X + U) (mod 2),M 2 : Y ← 0<br />
Trick: bit parity is observationally <strong>the</strong> same as<br />
a constant 0 function in <strong>the</strong> bow arc graph<br />
Conclusion: P(v),G ⊬ id P(y|do(x))<br />
U<br />
U<br />
M<br />
M<br />
1 2<br />
X<br />
Y<br />
X<br />
Y
Identifying Effects of Singletons<br />
Identifying P x (v \ x) (X a singleton):<br />
Theorem (Tian 2002): P(v),G ⊬ id P x (v \ x) iff<br />
<strong>the</strong>re is a child Z of X with a bidirected path<br />
from X to Z<br />
Question: What about P x (y) where X,Y are<br />
arbitrary sets? (Open since 1950s)<br />
X<br />
Z 1<br />
. . .<br />
Z Z k<br />
. . .
do-calculus<br />
do-calculus is a set of three rules <strong>for</strong><br />
manipulating interventional distributions:<br />
1 P x (y|z,w) = P x (y|w) if Y ⊥ Z|X,W in G X<br />
2 P x,z (y|w) = P x (y|z,w) if Y ⊥ Z|X,W in G X,Z<br />
3 P x,z (y|w) = P x (y|w) if Y ⊥ Z|X,W in G X,Z<br />
∗<br />
where Z ∗ = Z \ An(W) in G X<br />
Question: Can every identifiable effect be<br />
derived using do-calculus? (Open since 1994)
Our Results<br />
A complete graphical condition <strong>for</strong><br />
identification in <strong>the</strong> general case of P x (y)<br />
A complete algorithm expressing P x (y) in<br />
terms of P(v)<br />
A proof of completeness of do-calculus <strong>for</strong><br />
identifying P x (y)<br />
A characterization of models where all effects<br />
are identifiable
C-components<br />
C-component: maximal set of nodes pairwise<br />
connected by bidirected paths<br />
Given graph has two C-components:<br />
{W,Z,M} and {X,Y }<br />
W Z M<br />
X<br />
Y
General <strong>Identification</strong> Strategy<br />
Each C-component S i in G x corresponds to a<br />
subproblem<br />
Key step: to test P x (y) from P(v),G, test<br />
P v\si (s i ) <strong>for</strong> each s i , return ∑ ∏<br />
v\x,y i P v\s i<br />
(s i )<br />
For example: P x (y) = ∑ w,z P z(y)P x (z,w)<br />
W<br />
Z<br />
X<br />
Y
Example (cont.)<br />
So far: P x (y) = ∑ w,z P z(y)P x (z,w)<br />
P z (y) = ∑ w,x<br />
P(y|z,x,w)P(x,w) (back-door)<br />
P x (z,w) = P(z|x,w)P(w) (Tian)<br />
Combining: P x (y) =<br />
∑<br />
w,z,x ′ ,w ′ P(y|z,x ′ ,w ′ )P(x ′ ,w ′ )P(z|x,w)P(w)<br />
W<br />
Z<br />
X<br />
Y
<strong>Identification</strong> Failure<br />
Identifying P x (y) fails if Y is a C-component,<br />
Y ∪ X = V is a C-component, and some X is<br />
relevant to Y (e.g., is in An(Y ))<br />
Key <strong>the</strong>orem <strong>for</strong> completeness (aaai-06): if<br />
<strong>the</strong> above failure occurs in any subproblem,<br />
<strong>the</strong> overall effect is not identifiable<br />
Proof by explicit counterexample (as in <strong>the</strong><br />
bow arc case)<br />
Sanity check: <strong>the</strong> bow arc graph is a special<br />
case of this condition
Conditional Effects<br />
The likelihood of Y may be altered by<br />
observing Z = z after doing do(x)<br />
Such conditional effects are represented by<br />
distributions of <strong>the</strong> <strong>for</strong>m<br />
P x (y|z) = P x (y,z)/P x (z)<br />
In general, interventions and conditioning do<br />
not commute: P(y|z,do(x)) ≠ P(y|do(x),z)<br />
However, if Z ∉ De(X) G , <strong>the</strong>n equality holds
Identifying Conditional Effects<br />
Main result (uai-06): <strong>the</strong>re exists a unique<br />
maximum subset w ⊆ z, such that:<br />
We can discover w in polynomial time<br />
P x (y|z) = P x,w (y|z \ w)<br />
P(v),G ⊢ id P x (y|z) iff P(v),G ⊢ id P x,w (y,z \ w)<br />
This means identifiability of unconditional effects<br />
is all we need, and we get completeness <strong>for</strong> free!
Identifying Counterfactuals<br />
Counterfactuals involve multiple worlds<br />
Our “axioms” φ will be <strong>the</strong> set of all<br />
experiments we can per<strong>for</strong>m in a single world:<br />
P ∗ = {P x (v \ x)|x ⊆ v}<br />
We separate two sets of difficulties:<br />
Going from P(v) to P ∗ (already handled)<br />
Going from P ∗ to P(γ) or P(γ|δ)
Graphs and Counterfactuals<br />
We need a way to display causal<br />
assumptions across multiple worlds<br />
First attempt: twin network graph (Balke and<br />
Pearl)<br />
Problem: only two worlds!<br />
A<br />
A=false<br />
A*=true<br />
H H H*
Graphs and Counterfactuals (ct.)<br />
Adding more worlds yields <strong>the</strong> parallel worlds<br />
graph (ijcai-05)<br />
More problems: distinct nodes can be <strong>the</strong><br />
same variable<br />
Urx<br />
Rx<br />
Rx Rx’ Rx*<br />
A<br />
A=a<br />
Uh<br />
A’=a<br />
A*=a*<br />
H<br />
H<br />
H’ H*
Graphs and Counterfactuals (ct.)<br />
Our solution: Rid <strong>the</strong> parallel worlds graph of<br />
duplicate nodes inductively, starting at <strong>the</strong><br />
roots to obtain <strong>the</strong> counterfactual graph<br />
Urx<br />
Rx<br />
Rx Rx’ Rx*<br />
Rx<br />
A<br />
A=a<br />
Uh<br />
A’=a<br />
A*=a*<br />
A=a<br />
Uh<br />
A*=a*<br />
H<br />
H<br />
H’ H*<br />
H<br />
H*
<strong>Identification</strong> of Counterfactuals<br />
All identifiable P(γ) can be computed from P ∗<br />
in a way similar to effects (subproblems based<br />
on C-components in <strong>the</strong> counterfactual graph)<br />
Testing P(γ|δ) similar to testing P x (y|z)<br />
Testing P(γ) fails if γ <strong>for</strong>ms a C-component,<br />
all subscripts are in Pa(γ), and <strong>the</strong>re is a<br />
“conflict,” e.g., ∃y x ∈ γ and z x<br />
′ ∈ γ or x ′ z ∈ γ<br />
Key <strong>the</strong>orem (uai-07): failure in any<br />
subproblem implies overall counterfactual not<br />
identifiable
Example<br />
Effect of treatment on <strong>the</strong> treated: P(y x |x ′ )<br />
Not identifiable in <strong>the</strong> bow arc graph, but<br />
identifiable in <strong>the</strong> front-door graph<br />
P(y x |x ′ )= P(y x,x ′ )<br />
P(x ′ )<br />
=<br />
∑<br />
z P(y|z,x′ )P(x ′ )P(z|x)<br />
P(x ′ )<br />
∑<br />
z P z(y,x ′ )P x (z)<br />
P(x ′ )<br />
=<br />
= ∑ z P(y|z,x′ )P(z|x)<br />
U<br />
X=x’<br />
U<br />
X<br />
Z Y<br />
X=x Z Y
Conclusions<br />
Using <strong>the</strong> framework of graphical causal<br />
models, we posed two causal deduction<br />
problems: evaluating causal effects from<br />
observations, and evaluating counterfactuals<br />
from experiments<br />
We gave complete algorithms <strong>for</strong> solving<br />
<strong>the</strong>se problems, along with a characterization<br />
of models where a given problem can be<br />
solved