Let $X_G \sim \mathcal{CN}(\mathbf{0},\mathbf{\Sigma}_X)$, $Z$ be some random vector with unknown distribution and non-singular covariance $\mathbb{E}\{ ZZ^H \} = \mathbf{\Sigma}_Z$. Let $X_g$ and $Z$ be uncorrelated but not necessarily be independent. Furthermore, let $$ \tag{1} Y = X_G + Z.$$ I'm interested in the mutual information $$ \tag{2} I(X_G;Y) = h(Y) - h(Y \vert X_G) = h(Y) - h( Z ) .$$
According to e.g. [1, Theorem 2], the differential entropy $h(Z)$ is upper bounded by the entropy of a complex Gaussian random vector $Z_G \sim \mathcal{CN}(\mathbf{0},\mathbf{\Sigma}_Z)$ with the same covariance $$ \tag{3} h(Z) \leq h(Z_G) = \log \det \left( \pi e \mathbf{\Sigma}_Z \right), $$ because the complex Gaussian distribution has the largest entropy. Using this I obtain the following lower bound on (2) $$ \tag{4} I(X_G;Y_G) = h(Y_G) - h(Z_G) \leq h(Y) - h(Z) = I(X_G;Y), $$ where $Y_G$ indicates that $Y$ is complex normal distributed in this case.
In contrast, the authors of [2, Lemma II.2] obtain the same lower bound (4), however, they require that $Z$ and $Z_G$ are both independent of $X_G$ (which they are not in my case). They use a different derivation which mainly relies on Jensen's inequality. However, I do not see where they make use of $Z$ and $Z_G$ being independent of $X_G$ in their derivation.
Also in [3, Theorem 1] the authors discuss a similar problem where $Z$ and $X_G$ are uncorrelated but not independent. However, the main step of their solution is also the Gaussian entropy upper bound (3). Furthermore, they also obtain the worst case covariance matrix $\mathbf{\Sigma}_Z^*$ (which I am not interested in).
In summary, I have the following questions
- Are the lower bounds (3) and (4) valid?
- Why does [2, Lemma II.2] require $Z$ and $Z_G$ both to be independent of $X_G$, which is also the reason for the "new" theorem in [3]?
References
[1] Neeser, Fredy D., and James L. Massey. "Proper complex random processes with applications to information theory." IEEE transactions on information theory 39.4 (1993): 1293-1302.
[2] Diggavi, Suhas N., and Thomas M. Cover. "The worst additive noise under a covariance constraint." IEEE Transactions on Information Theory 47.7 (2001): 3072-3081.
[3] Hassibi, Babak, and Bertrand M. Hochwald. "How much training is needed in multiple-antenna wireless links?." IEEE Transactions on Information Theory 49.4 (2003): 951-963.
Eq (2) is wrong. The last term should be $h(Z\mid X_G)$. In general
$$h(Y|X_G)=h(X_G + Z|X_G)=h(Z\mid X_G)$$ You can equate this to $h(Z)$ only if $Z,X_G$ are independent.