chain rule ordering

130 Views Asked by At

suppose I want to compute the joint probability $P(X,Y,Z)$ with chain rule. Is it true that there will be $3!$ possible factorization ? or is there more ? I got 6 for this example:

  1. $P(X|YZ)P(Y|Z)P(Z)$
  2. $P(X|YZ)P(Z|Y)P(Y)$
  3. $P(Y|XZ)P(X|Z)P(Z)$
  4. $P(Y|XZ)P(Z|X)P(X)$
  5. $P(Z|YX)P(Y|X)P(X)$
  6. $P(Z|YX)P(X|Y)P(Y)$

My second question is that is the value of each line is equal to other lines or not ? meaning that the chain rule will give the same value regardless the ordering ? I am assuming DAG and in my thinking that they are not equal and I want someone to confirm. Thanks

2

There are 2 best solutions below

3
On BEST ANSWER

. Is it true that there will be 3! possible factorisations?

Obviously.

There are $3$ ways to factorise out one variable from three:$$\begin{align}\mathsf P(X,Y,Z)&=\mathsf P(X,Y\mid Z)~\mathsf P(Z)\\&=\mathsf P(X,Z\mid Y)~\mathsf P(Y)\\&=\mathsf P(Y,Z\mid X)~\mathsf P(X)\end{align}$$

Likewise for each of those way there are two ways to factorise out one variable from two:$$\begin{align}\mathsf P(X,Y\mid Z)&=\mathsf P(X\mid Y,Z)~\mathsf P(Y\mid Z)\\&=\mathsf P(Y\mid X,Z)~\mathsf P(X\mid Z)\end{align}$$


I am assuming DAG and in my thinking that they are not equal

A Directed Acyclic Graph maps a particular factorisation. The factorisations are distinct but equal, so too are their corresponding DAG.

You can tell them apart, but they are all factorisations of the same joint probability.


Also, you do not need to assume a particular DAG; they are visual representations of factorisations.

A Directed Acyclic Graph is a visual representation for a particular factorisation.  Every DAG encodes a particular factorisation of the joint probability for its nodes.  Likewise every factorisation for that joint probability will have its own graphical representation.

Consider these two different looking graphs. $\def\P{\operatorname{\sf P}}\require{enclose}\def\circle#1{\enclose{circle}#1}$

$$\begin{align}\begin{matrix}\circle{A}&\to& \circle B\\&\searrow&\downarrow\\&&\circle C\end{matrix}\text{ encodes } \P(A,B,C)=\P(A)\P(B\mid A)\,\P(C\mid A,B)\\[0ex]\\\hline\\[1ex]\begin{matrix}\circle{A}&\to& \circle B\\&\nwarrow&\uparrow\\&&\circle C\end{matrix}\text{ encodes } \P(A,B,C)=\P(C)\P(A\mid C)\,\P(B\mid A,C)\end{align}$$

Since both are factorisations for the same joint probability measure (and hence the factorisations are equal), we say the DAG are equivalent.

2
On

I always thought of $P(A|B)P(B) = P(B|A)P(A)$ by thinking of such tree:

enter image description here

First, $A$ happens, with $P(A)$, and then $B$ happens with $P(B|A)$, yielding $P(A,B) = P(B|A)P(A)$.

But you can totally flip the tree to have first $B$ and then $A$, thus yielding $P(A,B) = P(A|B)P(B)$.

(Putting it all together you then get the famous Baye's rule).

If there are three variables, I think it is correct to think of 6 possible trees that will directly correspond to your 6 derivations and will have have equal values since they all match the probability of the three events happening.

I'm not sure if this answer your questions :-)