I am reading Elements of Information Theory by Cover and Thomas, and I have come across the above corollary to Markov chains.
In their notation, this would amount to showing that $Z=g(Y) \implies p(x,z|y)=p(x|y)p(z|y)$.
My idea is that since $Z=g(Y)$, then somehow $p(x,z|y)=p(x|y)$ and somehow $p(z|y)=1$, but how exactly this works is confusing to me, perhaps because of the notation they use.
I am confused. So far, I can understand that $Z=g(Y) \implies p(z|y)=h(y) \text{ and } p(x,z)=p(x,g(y))$ but I am having trouble formally doing the proof.
Any help on the intuition, or the logical steps I am not understanding correctly, or perhaps misunderstanding of the probability notation would be much appreciated! Thanks.
By definition, $X \to Y \to Z$ being a Markov chain means $p(x,y,z)=p(x)p(y|x)p(z|y)$, so using Bayes Rule (as in (2.118) of Thomas & Cover)
\begin{align} p(x,z|y) = \frac{p(x,y,z)}{p(y)} = \frac{p(x)p(y|x)p(z|y)}{p(y)}=p(x|y)p(z|y). \end{align}
I put this part to show why Markovianity implies the conditional independence and for the folks without access to the book. However, I'll instead show $X$ and $Z$ being conditionally independent given $Y$ directly. I think this is more insightful than mechanically matching probability equations.
Intuitively: By its definition, given $Y$, $Z$ is fully determined. That is, conditioning $X$ (or anything) on $Y$ is identical to conditioning it on both $Z$ and $Y$. This is because given $Y$, $Z$ comes for free since we can just plug $Y$ in $g(.)$. Now looking at the other perspective, conditioning $Z$ on $Y$ is identical to conditioning on $Y$ together with $X$ (or anything). Once we condition it on $Y$, there is no uncertainty left in $Z$, so no matter what else you condition on, it can't reduce the uncertainty any further. With these intuition, we see that $X$ and $Z$ are conditionally independent given $Y$ and $X \to Y \to Z$ is a Markov chain.
Formally: $p(x|z, y) = p(x|y, g(y)) = p(x|y)$ because the sets $\{Y = y\}$ and $\{Y=y, Z=g(y)\}$ are identical, so we are conditioning on the same outcomes in either case. Similarly, $p(z|x,y) = 1_{ \{z = g(y)\} }(z) = p(z|y)$ by definition of $Z$, where $1_{A}(z)$ is the indicator function that is 1 if $z \in A$ and 0 otherwise. Therefore, $X$ and $Z$ are conditionally independent and $X \to Y \to Z$ forms a Markov chain.