On the Derivation of Judea Pearl's Front-Door Adjustment Formula in The Book of Why

687 Views Asked by At

I have a number of related questions about the derivation of the front-door adjustment formula as given on page 236. Here is the derivation. I would have typed it up, but the diagrams at the far right would have been a pain to include.

Derivation of the Front-Door Adjustment Formula

There is a typo in Line 4, caught in the errata. It should be

$$=\sum_t P(c|\operatorname{do}(t){\color{red})}\,P(t|s). $$

Some additional background are the Rules of the Do-Calculus, which are as follows:

Rule 1. Assume the variable set $Z$ blocks all paths from $W$ to $Y$ after we have deleted all arrows leading into $X.$ Then $$P(Y|\operatorname{do}(X),Z,W)=P(Y|\operatorname{do}(X),Z). $$

Rule 2. If $Z$ blocks all back-door paths from $X$ to $Y,$ then $$P(Y|\operatorname{do}(X),Z)=P(Y|X,Z). $$

Rule 3. If there are no causal paths from $X$ to $Y,$ then $$P(Y|\operatorname{do}(X))=P(Y). $$

My questions are as follows:

  1. Given the invocation of the Probability Axioms in Line 1, it is not difficult to follow the validity of the same invocation in Line 5. However, which axiom is being used, here? Where can I find a discussion of it?

  2. In Lines 2, 3, 4, 6, and 7, Pearl invokes Rule 2 or 3. Next to the invocation is a diagram, which is supposed to be some subset of the original at the top, involving a stereotypical confounding unobserved variable situation. Why can Pearl just delete edges at will? That is, how come the expressions are equivalent while he's manipulating the diagram right and left?

  3. The final result has $s$ and $s'$ in it, but the version of the Front-Door Adjustment Formula on page 227 does not: $$P(Y|\operatorname{do}(X))=\sum_z P(Z=z|X) \sum_x P(Y|X=x,Z=z) P(X=x). \quad \text{(7.1)} $$ Here $z$ is like $t$ in the formula above, as well as $x\to s$ and $y\to c.$ How has he proved the formula on page 227? Wouldn't he have to collapse $s'\to s$ to finish?

Thank you for your time!

1

There are 1 best solutions below

8
On BEST ANSWER

Adrian, here are the answers:

1 - Here we are using the Law of Total Probability, that is, $p(y)=\sum_x{p(y|x)p(x)}$.

2 - Every valid manipulation of the causal expression has to preserve the meaning of that expression. For instance, the equality $p(t|do(s)) = p(t|s)$ is, in general, not true. It is only true if there are no backdoor paths between $t$ and $s$. Thus, to know whether you can substitute the $do(s)$ operator with regular conditioning on $s$, you need to check whether this holds in your model. That's why you need to check the auxiliary graphs– the graphs provide the test to know whether the substitution is valid. So for instance, in the case of $p(t|do(s)) = p(t|s)$, the test is to: (i) delete outgoing arrows from $s$; and, (ii) check whether $t$ and $s$ are separated in the graph. If they are, then the manipulation is valid. This is what rule 2 is saying. Therefore, you can't delete arrows arbitrarily. These modifications on the graph are done to check the conditions that license the manipulation.

3 - The formula is not proved on page 227, you can find its proof here. Regarding $s$ versus $s'$, this is for notation purposes, and it is useful to keep them distinct, because we have two different operations being beformed with $s$ here. Let's rewrite the estimand as,

$$p(c|do(s))= \sum_{t}p(t|s)\sum_{s'}p(c|t, s')p(s')$$

Thus, the first $s$ in $p(t|s)$ stands for the same $s$ in the $do(s)$ expression. That's the value $s$ you are setting the variable $S$ to, say, $S= 1$. The term involving $s'$ stands for summing over all values of possible values of $S$. You could have written that as $\sum_{s}p(c|t, s)p(s)$, but this notation without primes could lead to ambiguity, since we now have the same symbol for the specific value of $s$ in the expression $p(t|s)$. Thus, the primes are added for clarity, so the reader understand these are different values.

Hope these clarifications help!

Addendum:

Regarding the law of total probability, one way to help thinking about it may be remembering that interventions define a new probability distribution, and the law of total probability only holds when using the same probability distribution. So let's define the post-intervention distribution after intervening on $S$ as $P^*(\cdot)$, that is, $P^*(\cdot) := P(\cdot|do(s))$. Thus the law of total probability on $P^*$ states that,

$$ P^*(c) = \sum_{t}P^*(c|t)P^*(t) $$

If we now recall that $P^*(\cdot) := P(\cdot|do(s))$, we have,

$$ P(c|do(s)) = \sum_{t}P(c|t, do(s))P(t|do(s)) $$

Regarding the formula on page 227, the notation is indeed not ideal. The best notation is using the primes to avoid these confusions. In that page, the big $X$ is indeed standing for the little $s,$ and the little $x$ for the $s'.$

So the two formulas are indeed equivalent (barring, of course, the different notation and different names for the variables). The main source of confusion in the second formula is that the symbol "capital $X$" is being used to denote both an instantiation value (inside the $do(X)$) and the random variable (when we write $X = x$).

Regarding the first formula, with lower case letters only, that's a somewhat standard notation (see Causality Chapter 1), in which we usually use $P(y|x)$ as a shorthand notation for $P(Y=y|X=x)$.

PS: maybe these questions are probably better suited in cross-validated.