I am reading through a text which says that:
Suppose that A influences the value of B and B influences the of C, but
A and C are independent given B. We can represent the probability distribution
over all three variables as a product of probability distribution over
two variables.
then proceeds to describe the joint probability distribution as: $$p(a,b,c) = p(a)p(b|a)p(c|b)$$
What I don't understand is that the forumla above is still using three variables, how can it be said to be a product of probability over two variables?
Why is this a more efficient representation of the joint distribution?
suppose $a,b,c$ comes from alphabet $A,B,C$ of size $K$.
To store $p(a,b,c)$, we need $K^3$ entries.
To store $p(a)$, we need $K$ entry.
To store $p(b|a)$, we need $K^2$ entries.
To store $P(c|b)$, we need $K^2$ entries.
Hence for the right hand side, the storage space required is $O(K^2)$ but for the left hand side, the storage space required is $O(K^3)$.
$p(b|a)$ and $p(c|b)$ are both probability over two variables, each term only involves two variables, $a,b,c$ do not occur simultaneously.