Clarification on the properties of double summation in causal inference

52 Views Asked by At

Below is the proof along with the causal graph I copied from a textbook about causal inference by Brady Neal:

Claim Given the causal graph is Figure A.1, $P(m \mid d o(t))=P(m \mid t)$.

START OF THE PROOF

Proof. We first apply the Bayesian network factorization (Definition 3.1): $$ P(w, t, m, y)=P(w) P(t \mid w) P(m \mid t) P(y \mid w, m) $$

Next, we apply the truncated factorization (Proposition 4.1): $$ P(w, m, y \mid d o(t))=P(w) P(m \mid t) P(y \mid w, m) $$

Finally, we marginalize out $w$ and $y$ : $$ \begin{aligned} \sum_w \sum_y P(w, m, y \mid d o(t)) & =\sum_w \sum_y P(w) P(m \mid t) P(y \mid w, m) \\ P(m \mid d o(t)) & =\left(\sum_w P(w)\right) P(m \mid t)\left(\sum_y P(y \mid w, m)\right) \\ & =P(m \mid t) \end{aligned} $$

END OF THE PROOF

What I do no understand is how $$ \begin{aligned} \sum_w \sum_y P(w) P(m \mid t) P(y \mid w, m) \end{aligned} $$

becomes

$$ \begin{aligned} \left(\sum_w P(w)\right) P(m \mid t)\left(\sum_y P(y \mid w, m)\right) \end{aligned}. $$

My understanding is that $P(w)$ and $P(m\mid t)$ have nothing to do with $\sum_y$, so we can rearrange $$ \begin{aligned} \sum_w \sum_y P(w) P(m \mid t) P(y \mid w, m) \end{aligned} $$

into

$$ \begin{aligned} \sum_w P(w)P(m \mid t)\left(\sum_y P(y \mid w, m)\right) \end{aligned}. $$

But why can $\sum_w P(w)$ be separated from the rest of the expression? Is it some general rule of double summation or is it only applied to this specific case?

Sorry for my bad English.