The question is:
Let $P_{Y|X}$ be a transition kernel. We define: $$\eta_{KL}(P_{Y|X})=\sup_{P_{X},Q_{X}} \frac{ D_{KL}( P_{Y}|| Q_{Y} ) }{ D_{KL}(P_{X}||Q_{X}) },\hspace{0.3cm} \widehat{\eta}_{KL}(P_{Y|X})=\sup_{U,X: U-X-Y} \frac{ I(U;Y) }{ I(U;X) } $$ where $U-X-Y$ means that $U$ and $Y$ are conditionally independent given $X$ (Markov's chain). Show that $\widehat{\eta}_{KL}(P_{Y|X})\le \eta_{KL}(P_{Y|X})$.
Context: Here, $D_{KL}(\cdot ||\cdot )$ is the Kullback$-$Leibler divergence, and the supremum is taken over all $P_{X},Q_{X}$ such that $0<D_{KL}(P_{X}||Q_{X})<\infty$.
I was following a solution, and at some point they say that (fixed $P_{X}$ and being $U-X-Y$ a Markov's chain):
$$\frac{D_{KL}(P_{Y|U} || P_{Y}|P_{U})}{ D_{KL}(P_{X|U} || P_{X}|P_{U}) }\le \sup_{Q_{X}} \frac{ D_{KL}( Q_{Y}|| P_{Y} ) }{ D_{KL}( Q_{X}||P_{X} ) } $$
They really take this inequality as obvious but no matter how much I think about it, I do not understand why it is true. Any help to welcome, since this inequality trivially implies what we want to prove.
Recall that $D(P_{Y|U}\|P_Y|P_U) = \sum P_U(u) D(P_{Y|U = u}\|P_Y)$. Now notice that each choice of $U = u$ induces a different law on $X$. Denote these as $Q_X^u = P_{X|U = u}$ to make the argument clearer. Next define $Q_Y^u = P_{Y|X} \circ Q_X^u,$ and notice that due to the Markov structure, $Q_Y^u = P_{Y|U = u}.$ But now, by definition, for each $u$, $$ D(Q_Y^u \|P_Y) \le \eta_{KL}(P_X) D(Q_X^u \|P_X), $$ where $\eta_{KL}(P_X) = \sup_{Q_X} \frac{D(Q_Y\|P_Y)}{D(Q_X\|P_X)}$ (with $P_X, P_{Y|X}$ both fixed). So we have that \begin{align} D(P_{Y|U}\|P_Y|P_U) &= \sum_u P_U(u)D(Q_Y^u\|P_Y) \\ &\le \eta_{KL}(P_X) \sum P_U(u) D(Q_{X}^u \|P_X)\\ &= \eta_{KL}(P_X) D(P_{X|U}\|P_X|P_U),\end{align} where I've been sloppy about the substitutions $Q_*^u = P_{*|U = u}$ for $* \in \{X,Y\}$.
Note - I've assumed discrete valued $U$, but the argument extends trivially to generic $U$ due to its pointwise nature.