I need help with understanding the proofs of Theorem 9.6.1 and 9.6.2 in Cover and Thomas' Elements of Information theory.
They first give out a result that the capacity of the time-varying Gaussian channel with feedback is $C_{n,FB} = \max_{\frac{1} {n}tr(K^{(n)}_X) \leq P} \frac{1}{2n} \log \frac{|K^{(n)}_{X+Z}|}{|K^{(n)}_Z|}$, and the distribution on $X^n + Z^n$ achieving the maximum entropy is Gaussian.
Theorem 9.6.1: In their proof, they say that $ \frac{1}{2n} \log \frac{|K^{(n)}_{Y}|}{|K^{(n)}_Z|} + \epsilon_n \leq C_{n,FB} + \epsilon_n$ by the entropy maximizing property of the normal.
I am not very certain about this part. Is this because the $X^n + Z^n$ distribution achieving $C_{n,FB}$ is normal (according to the previous paragraph), and normal distribution maximizes differential entropy?
Theorem 9.6.2: In the first inequality in their proof of Theorem 9.6.2, they say that $C_{n,FB} \leq \max_{tr(K_X) \leq nP} \frac{1}{2n} \log \frac{|K_{Y}|}{|K_Z|}$ because of Theorem 9.6.1. I don't understand how Theorem 9.6.1 can be applied to reach this conclusion. (They dropped the superscript $(n)$ in this theorem for $K_Y$ and $K_Z$, but it has the same meaning as the version with superscript.)
Aren't these two quantities the same by definition? How/why do you apply Theorem 9.6.1 to get the first inequality, and why are the inequality directions opposite in the proofs of these two theorems?
Thanks everyone in advance for trying to help out.



(9.120) is due to the maximization in the definition of C_{n,FB} (9.153) is a typo, it should be "="