Does order of random variables matter in chain rule (probability)

473 Views Asked by At

Suppose that we have the joint distribution of two random variables $X, Y$. Does it matter if we write it as $P(X, Y)$ or $P(Y, X)$? I would say yes if when substitute for their values we are not specifying which value each variable it takes. I want to clear this out because I am not sure if some manipulations are allowed when applying the chain rule (see below).

For, example suppose that $P(X=1, Y=2) = 0.2$, then $P(Y=2, X=1)$ should be also equal to $0.2$. But if we had written $P(1, 2)$ (assuming the distribution was given as $P(X, Y)$) then $P(1, 2)$ should not equal necessary to $P(2, 1)$. Does this mean that we have to be consistent in the order when we write the chain rule?

Chain rule

Is it valid to write the chain rule (assuming 4 variables):

$$P(X_4, X_3, X_2, X_1) = P(X_4 | X_3, X_2, X_1) \cdot P(X_3, X_2, X_1)$$

as:

$$P(X_1, X_2, X_4, X_3) = P(X_4 | X_1, X_3, X_2) \cdot P(X_1, X_2, X_3)$$

where the ordering is no longer consistent?

1

There are 1 best solutions below

0
On

Suppose that we have the joint distribution of two random variables $X,Y$. Does it matter if we write it as $P(X,Y)$ or $P(Y,X)$?

It does not matter too much, because both notations are misleading. You are confusing random variables with their values, and you are confusing probabilities and probability mass functions. This is where all the problems you are having start.

A better way to give the joint probability mass function of $X$ and $Y$ is to define it as $p_{XY}(x,y) = P(X=x \text{ and }Y=y)$. It doesn't matter what the input variables $x,y$ are named; you can also define $p_{XY}(a,b)$ as $P(X=a \text{ and }Y=b)$. You can also define $p_{YX}(a,b)$ as $P(Y=a \text{ and }X=b)$.

Now it is true that $p_{XY}(1,2) = p_{YX}(2,1)$, because these end up as probabilities of the same event. However, there is no reason why $p_{XY}(1,2)$ should be equal to $p_{XY}(2,1)$.

Even this notation is not perfect: we may prefer $p_{X,Y}(a,b)$, so that we do not confuse this with the probability mass function of the random variable $XY$ (the product of $X$ and $Y$). But that's an unlikely confusion.

You can write the chain rule $$ p_{X_1X_2X_4X_3}(a,b,c,d) = p_{X_4 \mid X_1 X_3 X_2}(c,a,d,b) p_{X_1 X_2 X_3} (a,b,d) $$ with an appropriate definition of $p_{X_4 \mid X_1 X_3 X_2}$. The three functions put their variables in inconsistent orders, but the result is still valid, because we make sure that the variables $a,b,c,d$ consistently correspond to the same random variables throughout.

Often, to make this correspondence clear, we use a lowercase $x_i$ for the input variable corresponding to the random variable $X_i$. Then we would write $$ p_{X_1X_2X_4X_3}(x_1, x_2, x_4, x_3) = p_{X_4 \mid X_1 X_3 X_2}(x_4, x_1, x_3, x_2) p_{X_1 X_2 X_3} (x_1, x_2, x_3) $$ and it is tempting to drop the subscripts, because it feels like we are giving the ordering information twice. But we should not drop the subscripts, because we may plug in values for $x_1, x_2, x_3, x_4$ and then only $X_1, X_2, X_3, X_4$ will be telling us the order. For instance, we might set $x_1=x_2=1, x_3=2, x_4=5$ and write $$ p_{X_1X_2X_4X_3}(1, 1, 5, 2) = p_{X_4 \mid X_1 X_3 X_2}(5, 1, 2, 1) p_{X_1 X_2 X_3} (1,1,2). $$ Now only the subscripts are telling us what these numbers mean.