Independence among random variables

73 Views Asked by At

Say we have the random variables $X_1$, $X_2$, $X_3$, $X_4$, $X_5$. We know that:

  • $X_5$ is influenced by $X_3$.
  • $X_4$ is influenced by both $X_2$ and $X_3$.
  • $X_2$ and $X_3$ are both influenced by $X_1$.

Which of the following assertions isn't true?

  1. $X_4$ is independent of $X_1$ given $X_2$ and $X_3$.
  2. $X_2$ is independent of $X_3$ given $X_4$.
  3. $X_5$ is independent of $X_1$ given $X_3$.

My guess is that it's the second assertion since it differs from the first and third, which talk about "bottom-level" random variables being independent of the "father" random variable $X_1$. But I am not sure, hence asking here.

1

There are 1 best solutions below

0
On BEST ANSWER

As the existing comment points out, the use of "influenced by" is ambiguous here. However, it does sound a lot like "depends on", or "is conditioned on", especially to someone familiar with probabilistic graphical models, and particularly Bayesian networks.

The chain rule of probability tells us that the joint of the five random variables can be factored as:

$$ P(X_5, X_4, X_3, X_2, X_1) = $$

$$P(X_5 | X_4, X_3, X_2, X_1)P(X_4|X_3, X_2, X_1)P(X_3|X_2, X_1)P(X_2|X_1)P(X_1)$$

A Bayesian network is a directed graphical model, where each factor on the right-hand side of the above equation, only depends on a smaller number of variables. This is the case here, where the joint should factorize as:

$$ p(X_5, X_4, X_3, X_2, X_1) = P(X_5|X_3)P(X_4|X_3,X_2)P(X_3|X_1)P(X_2|X_1)P(X_1)$$

If you were to draw the graphical model, the random variables would be vertices and there would be a directed edge from $X_i$ to $X_j$ if $X_j$ was conditioned on $X_i$ in the above factorization, i.e if $X_i$ was an ancestor of $X_j$.

Bayes nets offer a convenient way of reasoning about independencies through a concept called d-separation. Two sets of random variables A and B are d-separated given a third set of random variables C, if there does not exist an active path between A and B.

An (undirected) path is active given C if for every sequence of 3 variables X, Y, Z one of the following holds:

$$ (1) \ X \leftarrow Y \leftarrow Z, Y \notin C$$ $$ (2) \ X \rightarrow Y \rightarrow Z, Y \notin C$$ $$ (3) \ X \leftarrow Y \rightarrow Z, Y \notin C$$ $$ (4) \ X \rightarrow Y \leftarrow Z, Y \notin C $$

where for the fourth case, we also require that none of Y's descendants are in C.

The second assertion claims that $X_2$ is independent of $X_3$ given $X_4$. This is only the case if $X_2$ and $X_3$ are d-separated given $X_4$, that is, only if there exists no active path between them.

Note however that the following active path exists: $X_3 \leftarrow X_1 \rightarrow X_2, X_1 \notin \{ X_4\}$.

Therefore $X_2$ and $X_3$ are not independent given $X_4$ and the second assertion is false.

By the same principle, the first and third assertions are true.