I'm going over markov models and I'm tying to understand the following notation for chain probability:
$$p(a_1, a_2, a_3, a_4) = p(a_1)p(a_2\mid a_1)p(a_3\mid a_1,a_2)p(a_4\mid a_1,a_2,a_3)$$
"The Chain Rule is one simple consequence of the definition of conditional probability: the joint probability of some set of events $a_1, a_2, a_3, a_4$ can also be expressed as a ‘chain’ of conditional probabilities"
Can someone explain?
By definition of conditional probability, we have $P(A,B) = P(A\mid B)P(B)$, where $P(A,B)$ means $P(A\cap B)$.
Now taking $A=a_4$ and $B = (a_1,a_2,a_3)$, we get: $$P(a_4,a_3,a_2,a_1) = P(a_4\mid a_1, a_2, a_3)P(a_1,a_2,a_3)$$ Now with $A = a_3$ and $B=(a_1,a_2)$ the last factor can be written $$P(a_1,a_2,a_3) = P(a_3\mid a_1,a_2)P(a_1,a_2)$$ Once again, $P(a_1,a_2)$ is $P(a2\mid a_1)P(a_1)$ and you get your formula.