A Markov chain $X \rightarrow Y \rightarrow Z$ is given, where $X,Y,Z$ are random variables characterized by the probability distribution $p(x,y,z) = p(x)p(y|x)p(z|y)$.
It follows that $I(X;Y) \geq I(X;Z)$.
Given this, why is it true that $I(X;Y|Z) \leq I(X;Y)$ ?
I cannot understand this fact even at an intuitive level: why does knowing $Z$ reduce the mutual information of $X$ and $Y$ ?
The property follows by the chain rule of mutual information \begin{align} I(X ; Y, Z) &= I(X ;Y) + I(X; Z | Y) \stackrel{(a)}{=} I(X ; Y)\\ & =I(X;Z) + I(X;Y|Z) \geq I(X;Y|Z) \end{align} where (a) is by the Markov chain. While the data-processing inequality $I(X;Y) \geq I(X;Z)$ implies that $Z$ cannot contain more information about $X$ than $Y$ contains about $X$, the property $I(X;Y|Z)$ characterizes the loss caused by the processing $p(z|y)$, which must be smaller than or equal $I(X;Y)$, e.g. in the special case where $Z=Y$ there is no loss of information, $I(X;Y|Z) = 0$.
It can also be shown using entropies \begin{align} I(X;Y | Z) = H(X | Z) - H(X|Y, Z) \leq H(X) - H(X | Y) = I(X;Y) \end{align} where the inequality follows since $H(X|Z)\leq H(X)$ (conditioning reduces entropy) and $H(X| Y,Z) = H(X|Y)$ (Markov chain).