Intuitively use marginalization

Question

Intuitively use marginalization

116 Views Asked by Bumbble Comm At 10 May 2026 - 12:50

Is it always true that if you sum over some variable then you can "remove" that varaible in each expression of the product inside the sum? For example:

$ \sum_x P(x, y)P(y | x)P(y | x, z) = P(y)P(y)P(y|z)? $

It seems that this is often the case, but what I'm asking is if this is a general "rule" you can use?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

The notation you use is a little ambiguous, but in general the way you describe it, it is incorrect. For example, take $X, Y, Z$ as discrete random variables that are independent. Let $X$ take values $x_1, \dots , x_n$. According to your conjecture we should find that,

$$\sum_{i=1}^n \mathbb{P}(X = x_i, Y = y) \mathbb{P}(X = x_i, Z = z) = \mathbb{P}(Y=y) \mathbb{P}(Z = z) $$

But this is false since if $X, Y, Z$ are independent we obtain,

$$\sum_{i=1}^n \mathbb{P}(X = x_i) \mathbb{P}(Y = y) \mathbb{P}(X = x_i) \mathbb{P}(Z = z) = \mathbb{P}(Y=y) \mathbb{P}(Z = z) \sum_{i=1}^n \mathbb{P}(X = x_i)^2 $$

The sum of squares of the probabilities is not in general 1, take $X \sim \text{Bernoulli}(p)$ for example; that is, $X$ takes value $1$ with probability $p$ and $0$ with probability $(1-p)$. We do have that $(1-p) + p = 1$ but in general $(1-p)^2 + p^2 \neq 1$.

You should try to understand why marginalization works in the first place, it all comes back down to the following law of probability,

If $A, B$ are disjoint events then,

$$\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B), \ \ (*)$$

From this we have that if $B_1, \dots, B_n$ is a partition of the sample space, meaning

$$\bigcup_{i=1}^n B_i = \Omega, \ B_i \cap B_j = \emptyset, \ i \neq j$$

Then we obtain by applying the additivity of probability $(*)$, since we have that,

$$A = A \cap \Omega = A \cap \bigcup_{i=1}^n B_i = \bigcup_{i=1}^n A \cap B_i $$

Then,

$$\mathbb{P} (A) = \sum_{i=1}^n \mathbb{P}(A \cap B_i) $$

Now, coming back to the language of events as values of random variables, how is this marginilisation? Consider $X$ as a discrete random variable taking values $x_1, \dots , x_n$. Then the events, $\{X = x_1 \}, \{X = x_2 \}, \dots , \{X = x_n \} $ constitute a partition of the sample space! (since the cases are exhaustive and disjoint).

So if we let $A = \{Y = y\}$ and $B_i = \{X = x_i \}$ we have the familiar procedure of "marginilisation"

$$\mathbb{P} (Y = y) = \sum_{i=1}^n \mathbb{P}(Y = y, X = x_i), \ \ (**)$$

Note that this also works when there is a conditional probability as long as that conditional probability does not involve the variable you are marginilising over, again, try to link it back to the basics, a conditional event is of the form,

$$\mathbb{P}(A | C) = \frac{\mathbb{P}(A \cap C)}{\mathbb{P}(C)} $$

For our marginilisation formula we have from $(**)$ that if we wanted to condition on $Z = z$ for example we can do the following,

$$\mathbb{P} (Y = y, Z = z) = \sum_{i=1}^n \mathbb{P}(Y = y, X = x_i, Z = z)$$

Then we have,

$$\frac{\mathbb{P} (Y = y, Z = z)}{\mathbb{P}(Z = z)} = \sum_{i=1}^n \frac{\mathbb{P}(Y = y, X = x_i, Z = z)}{\mathbb{P}(Z = z)}$$

So in fact we recover our "marginilisation" with just simple arithmetic,

$$\mathbb{P}(Y = y | Z = z) = \sum_{i=1}^n \mathbb{P}(Y = y, X = x_i| Z = z)$$

In summary, when thinking about laws of probability, try to link it back to the fundamental laws of probability.

Intuitively use marginalization

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in CONDITIONAL-PROBABILITY

Related Questions in MARGINAL-PROBABILITY

Trending Questions

Popular # Hahtags

Popular Questions