Rényi Entropy-like Maximization by Markov Chain

75 Views Asked by At

Given distributions $P_{X, Y}$ and $Q_{Y, U}$ over the respective finite alphabets $\mathcal{X}, \mathcal{Y}, \mathcal{U}$ and a positive constant $\rho$, consider the following maximization problem:

\begin{equation} \max_{Q_{X \mid Y, U}} \left[ \rho H(X \mid U) - D(Q_{X \mid Y, U} \mid\mid P_{X \mid Y}) \right] \end{equation}

The conditional Shannon entropy $H(X \mid U)$ is computed with respect to the distribution $Q_{Y, U} \cdot Q_{X \mid Y, U}$ and

\begin{equation} D(Q_{X \mid Y, U} \mid\mid P_{X \mid Y}) = \sum_{(x, y, u) \in \mathcal{X} \times \mathcal{Y} \times \mathcal{U}} (Q_{Y, U} \cdot Q_{X \mid Y, U})(x, y, u) \log \frac{Q_{X \mid Y, U}(x \mid y, u)}{P_{X \mid Y}(x \mid y)}. \end{equation}

My question is: Do we suffer no loss in generality in assuming that $Q_{X \mid Y, U} = Q_{X \mid Y}$? In other words, does the maximization over Markov chains $Q_{Y, U} \cdot Q_{X \mid Y}$ perform equally well?

\begin{equation} \max_{Q_{X \mid Y, U}} \left[ \rho H(X \mid U) - D(Q_{X \mid Y, U} \mid\mid P_{X \mid Y}) \right] \stackrel{?}{=} \max_{Q_{X \mid Y, U} : Q_{X \mid Y, U} = Q_{X \mid Y}} \left[ \rho H(X \mid U) - D(Q_{X \mid Y, U} \mid\mid P_{X \mid Y}) \right] \end{equation}

A few observations:

  • $\max_{Q_X} \left[ \rho H(X) - D(Q_X \mid\mid P_X) \right] = \rho H_\frac{1}{1 + \rho}(X)$ where $H(X)$ is computed with respect to $Q_X$ and $H_\frac{1}{1 + \rho}(X)$ is the Rényi Entropy of order $\frac{1}{1 + \rho}$, computed with respect to $P_X$.
  • $D(Q_{X \mid Y, U} \mid\mid P_{X \mid Y}) \geq D(Q_{X \mid Y} \mid\mid P_{X \mid Y})$ with equality iff $Q_{X \mid Y} = Q_{X \mid Y, U}$.
  • If, for some given distribution $Q_{Y, U} \cdot Q_{X \mid Y, U}$, we recover a distribution $R_{X, Y, U} = Q_{Y, U} \cdot Q_{X \mid Y}$, we can in general not conclude that the following holds ($H(X \mid U)$ on the left-hand side is computed with respect to $R$, on the right-hand side with respect to $Q$):

\begin{equation} \rho H(X \mid U) - D(R_{X \mid Y} \mid\mid P_{X \mid Y}) \geq \rho H(X \mid U) - D(Q_{X \mid Y, U} \mid\mid P_{X \mid Y}) \end{equation}