For the chain rule of probability the order of variables does not matter and sets are not ordered as of themselves anyway. So therefore $P(A | B) P(B) = P(A \cap B) = P(B|A) P(A)$.
However, I cannot reconcile this with a scenario like in the following tree of (conditional) probabilities.
-|
-- (.1) -- A -|
| -- (.7) -- A
| |
| -- (.3) -- B
|
-- (.9) -- B -|
-- (.4) -- A
|
-- (.6) -- B
This actually does not satisfy the symmetry from above, as the order does matter:
$P((B,A)) = P(B) P(A|B) = .9 \times .4$
and
$P((A,B)) = P(A) P(B|A) = .1 \times .3$
When I search for "ordered joint distribution" or "ordered chain rule of probability" I don't find anything useful.
So, my suspicion is that I am fundamentally confused about something here. If that is the case, can you point out my misconception?
Alternatively I am just using the wrong search terms...
The conditional proability $\mathsf P(A\mid B)$ is defined as $\mathsf P(A\cap B)\div\mathsf P(B)$, when $\mathsf P(B)\neq 0$.
Likewise $\mathsf P(B\mid A):=\mathsf P(A\cap B)\div\mathsf P(A)$ . (Since $A\cap B = B\cap A)$ .
So it is definitely the case that $\mathsf P(A)~\mathsf P(B\mid A)~{=\mathsf P(A\cap B)\\=\mathsf P(B)~\mathsf P(A\mid B)}$
The tree just does not make any sense. Check the labels.