Assume three r.v. $X_1, X_2, X_3$. They are conditionally independent in the following way:
$ X_1 \perp X_3 | X_2$
We have that:
$$P(X_3 | X_2) = P(X_3| X_2, X_1)$$
In the notes I am reading it says
Eq. expresses that $X_3$ only depends on $X_1$ through $X_2$.
This statement seems to be to be an intuitive one. However, I am not sure what it means. For me it seems quite the opposite, given $X_2$, information about $X_1$ doesn't change the distribution for $X_3$.
Then the notes proceed to say:
We say such a set of variables form a Markov chain for which we use the notation $X_1 \rightarrow X_2 \rightarrow X_3$
I take that the notation with the arrows says that $X_2$ depends on $X_1$ and that $X_3$ depends on $X_2$, but that $X_3$ depends only on $X_1$ "through" $X_2$ (whatever "through" means). I would like to understand what these ideas mean as precisely as I can in the context of conditional independence and specially, the last diagram and what "through $X_2$ means".
Furthermore, another thing that confuses me is it says "$X_3$ inlet depends on $X_1$ through $X_2$". The word only seems extreme to be because just because we know $X_1 \perp X_3 | X_2$, it does't not mean $P(X_3) =P(X_3 | X_1) $. It isn't the only way, $X_3$ might depend on $X_1$ directly given no knowledge of $X_2$...right?
NOTE: I am aware that the conditional independence condition is symmetric and we achieve the same ideas but the other way round. The mathematical ideas/concepts are confusing, not the algebra. I want the intuition/concepts to be clarified.
The idea is that a Markov chain doesn't have a memory of past events. If I'm playing a game of chess with someone, I might use what I have learned about my opponent's strategy from all his/her previous moves, which would mean my moves would not follow a Markov chain.
However, if before I make each move you wipe my memory of the game so far, my play will follow a Markov chain; it only depends on the position immediately before I move, because that is all the information I have. My move depends on the previous moves in the game only because they got the board to a certain position.
It is true that "given $X_2$, information about $X_1$ doesn't change the distribution for $X_3$," but that means exactly the same thing that the book is saying. $X_3$ does depend on $X_1$, but only because $X_2$ depends on $X_1$; thus the dependence of $X_3$ on $X_1$ is through $X_2$, just as White's second move depends on White's first move only in so far as it defined the position of the board before Black moved.
(Note: in the chess analogy, $X_n$ is a position, not a move; otherwise it would clearly not be a Markov Chain)