Intuitively use marginalization

109 Views Asked by At

Is it always true that if you sum over some variable then you can "remove" that varaible in each expression of the product inside the sum? For example:

$ \sum_x P(x, y)P(y | x)P(y | x, z) = P(y)P(y)P(y|z)? $

It seems that this is often the case, but what I'm asking is if this is a general "rule" you can use?

1

There are 1 best solutions below

3
On BEST ANSWER

The notation you use is a little ambiguous, but in general the way you describe it, it is incorrect. For example, take $X, Y, Z$ as discrete random variables that are independent. Let $X$ take values $x_1, \dots , x_n$. According to your conjecture we should find that,

$$\sum_{i=1}^n \mathbb{P}(X = x_i, Y = y) \mathbb{P}(X = x_i, Z = z) = \mathbb{P}(Y=y) \mathbb{P}(Z = z) $$

But this is false since if $X, Y, Z$ are independent we obtain,

$$\sum_{i=1}^n \mathbb{P}(X = x_i) \mathbb{P}(Y = y) \mathbb{P}(X = x_i) \mathbb{P}(Z = z) = \mathbb{P}(Y=y) \mathbb{P}(Z = z) \sum_{i=1}^n \mathbb{P}(X = x_i)^2 $$

The sum of squares of the probabilities is not in general 1, take $X \sim \text{Bernoulli}(p)$ for example; that is, $X$ takes value $1$ with probability $p$ and $0$ with probability $(1-p)$. We do have that $(1-p) + p = 1$ but in general $(1-p)^2 + p^2 \neq 1$.

You should try to understand why marginalization works in the first place, it all comes back down to the following law of probability,

If $A, B$ are disjoint events then,

$$\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B), \ \ (*)$$

From this we have that if $B_1, \dots, B_n$ is a partition of the sample space, meaning

$$\bigcup_{i=1}^n B_i = \Omega, \ B_i \cap B_j = \emptyset, \ i \neq j$$

Then we obtain by applying the additivity of probability $(*)$, since we have that,

$$A = A \cap \Omega = A \cap \bigcup_{i=1}^n B_i = \bigcup_{i=1}^n A \cap B_i $$

Then,

$$\mathbb{P} (A) = \sum_{i=1}^n \mathbb{P}(A \cap B_i) $$

Now, coming back to the language of events as values of random variables, how is this marginilisation? Consider $X$ as a discrete random variable taking values $x_1, \dots , x_n$. Then the events, $\{X = x_1 \}, \{X = x_2 \}, \dots , \{X = x_n \} $ constitute a partition of the sample space! (since the cases are exhaustive and disjoint).

So if we let $A = \{Y = y\}$ and $B_i = \{X = x_i \}$ we have the familiar procedure of "marginilisation"

$$\mathbb{P} (Y = y) = \sum_{i=1}^n \mathbb{P}(Y = y, X = x_i), \ \ (**)$$

Note that this also works when there is a conditional probability as long as that conditional probability does not involve the variable you are marginilising over, again, try to link it back to the basics, a conditional event is of the form,

$$\mathbb{P}(A | C) = \frac{\mathbb{P}(A \cap C)}{\mathbb{P}(C)} $$

For our marginilisation formula we have from $(**)$ that if we wanted to condition on $Z = z$ for example we can do the following,

$$\mathbb{P} (Y = y, Z = z) = \sum_{i=1}^n \mathbb{P}(Y = y, X = x_i, Z = z)$$

Then we have,

$$\frac{\mathbb{P} (Y = y, Z = z)}{\mathbb{P}(Z = z)} = \sum_{i=1}^n \frac{\mathbb{P}(Y = y, X = x_i, Z = z)}{\mathbb{P}(Z = z)}$$

So in fact we recover our "marginilisation" with just simple arithmetic,

$$\mathbb{P}(Y = y | Z = z) = \sum_{i=1}^n \mathbb{P}(Y = y, X = x_i| Z = z)$$

In summary, when thinking about laws of probability, try to link it back to the fundamental laws of probability.