Find minima for the KL divergence

Question

Find minima for the KL divergence

777 Views Asked by user488081 At 28 Mar 2026 - 5:01

The task is exercise 33.7 from the book Information Theory, Inference, and Learning Algorithms.

The question is related to finding three distinc minima by minimizing the reversed KL divergence $KL(q||p)$.

The gist of the questions is as follows. Given the joint distribution $p(x,y)$ where rows represent $y$ and the columns $x$.

$$ \begin{array}{l|llll} & 1 & 2 & 3 & 4 \\ \hline 1 & 1 / 8 & 1 / 8 & 0 & 0 \\ 2 & 1 / 8 & 1 / 8 & 0 & 0 \\ 3 & 0 & 0 & 1 / 4 & 0 \\ 4 & 0 & 0 & 0 & 1 / 4 \end{array} $$ If we approximate p with q by minimizing $KL(q||p)$

1) show that there are three distinct minima

2) what is the value of $KL(q||p)$ if we set $q(x,y) = p(x)p(y)$?

We may use that $q(x,y) = q(x)q(y)$.

My progress so far: $$ \begin{array}{l} \mathrm{KL}(q(x, y) \| p(x, y))=\sum_{x, y} q(x, y) \log \frac{q(x, y)}{p(x, y)}=\sum_{x, y} q(x) q(y) \log \frac{q(x) q(y)}{p(x, y)} \\ =\sum_{x} q(x) \log q(x)+\sum_{y} q(y) \log q(y)-\sum_{x, y} q(x) q(y) \log p(x, y) \end{array} $$

Setting derivative wrt $q(x)$ to zero: $$ \begin{array}{l} \frac{\partial}{\partial q(x)} \operatorname{KL}(q(x, y) \| p(x, y))=0 \iff \\ \frac{\partial}{\partial q(x)} \sum_{x} q(x) \log q(x)+\frac{\partial}{\partial q(x)} \sum_{y} q(y) \log q(y)=\frac{\partial}{\partial q(x)}\sum_{x} q(x) \sum_{y} q(y) \log p(x, y) \end{array} $$

However i am not sure how to do these derivatives, that i suspect needs to be solved to proceed. And then solve wrt. $q(y)$ after. I am not very familiar with variational calculus. Any help is appreciated.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I'm going to change notation a bit - instead of writing $q(x)$ and $q(y),$ I'll write $q_X(x), q_Y(y).$ Similarly, instead of $q(x,y),$ I'll write $q_{XY}(x,y)$.

Instead of going straight through to the Lagrangians, let us stare at the structure of the question a bit.

First, note that since $q_{XY}({x,y})$ is a product distribution, if $q_X(x) > 0$ and $q_Y(y) > 0,$ then $q_{XY}({x,y}) = q_X(x) q_Y(y) > 0.$ Equivalently, if $q_X$ has support $S_X$ and $q_Y$ has support $S_Y$, then $q_{XY}$ has support $S_X \times S_Y$.

Next, observe that if there is a pair $x,y$ such that $p({x,y}) = 0$ but $q_{XY}({x,y}) > 0$ then $D(q\|p) = \infty$. In our problem, this means that the minimising $q_{XY}({x,y})$ must be positive only on $\{1,2\} \times \{1,2\} \cup \{3,3\} \cup \{4,4\}.$

So, our candidate minimising $q_X$ and $q_Y$ must have supports such that $$S_X \times S_Y \subset (\{1,2\} \times \{1,2\}) \cup \{3,3\} \cup \{4,4\}.$$

This directly leaves us with three cases:

$S_X \subset \{1,2\}$ and $S_Y \subset \{1,2\}$.
$S_X = S_Y = \{3\}.$
$S_X = S_Y = \{4\}.$

Cases 2 and 3 are simple - these consist of only one distribution each, which I'll call $q^3$ and $q^4$, which are $q^3_{X} = q^3_Y = (0,0,1,0)$ and $q^4_{X} = q^4_{Y} = (0,0,0,1)$. Note that these are both local minima - if we increase $q^3(x,y)$ for any $(x,y) \neq (3,3)$ then the divergence becomes $\infty$, so all distributions close to $q^3$ (close defined according to, say, the usual $\ell_2$ distance) have larger divergence. Similarly for $q^4$. Finally, note that $$D(q^3\|p) = D(q^4\|p) = \log(4).$$

This leaves case $1$. Let $q_X$ and $q_X$ be supported on $\{1,2\}$ each. Note that \begin{align} D(q\|p) &= \sum_{x= 1}^2 \sum_{y = 1}^2 q_X(x)q_Y(y) \log \frac{q_X(x) q_Y(y)}{{p(x,y)}} \\ &= \sum_{x= 1}^2 \sum_{y = 1}^2 q_X(x)q_Y(y) \log {q_X(x) q_Y(y)} + \log(8) \\ &= \log(8) - H(q_X) - H(q_Y) .\end{align}

Notice immediately that if we take $q_X$ and $q_Y$ to be the max-entropy distributions satisfying the given constraints, then we'll minimise the above. The contraint is just that both have supports on $\{1,2\}.$ So, the minimisers here are $q_X = q_Y = (1/2, 1/2, 0,0)$, and we attain the value $\log(8) - 2\log(2) = \log(2).$ Importantly, this case is convex, so the minimum above is unique.

So, we have found exactly three distinct local minima, of which the case 1 minimum is the unique global minimum.

Some comments on the line of attack you took, and comments you made, in the question:

This is not a variational calculus problem - you're looking for two vectors in the simplex $\Delta^4$. This is very much in the realm of standard vector calculus. Each $q_X(x), q_Y(y)$ is a variable, and you can compute partial derivatives as you normally would.
The optimisation problem is constrained - you must have $\sum_x q_X(x) = 1$, similarly for $Y$. These constraints must be incorporated into your Lagrangian (you must also, btw, have $q_X(x) \ge 0$ and $q_Y(y) \ge 0$).
The usual constrained optimisation approach is a little iffy, since the objective involves a product of decision variables via the term $\sum q_X(x) q_Y(y) \log(1/p(x,y))$, rendering the problem non-convex. In principle, this also has to occur in the study of case 1 in my argument above, but this is made easy by the fact that $p(x,y)$ is a constant in $\{1,2\} \times \{1,2\},$ at which point I can use that both $q_X$ and $q_Y$ are distributions supported on $\{1,2\}$ to simply evaluate this term to $\log(8)$.
Finally, if you had done question 2 first, then the line of attack I developed above would have been more obvious. Basically, the marginal $p_X(x), p_Y(y)$ have full support, so using $q = p_X(x) p_Y(y)$ would yield infinite divergence. This hints at the fact that the minimisers we are interested in must have restricted supports.

Find minima for the KL divergence

There are 1 best solutions below

Related Questions in REAL-ANALYSIS

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in INFORMATION-THEORY

Related Questions in CODING-THEORY

Trending Questions

Popular # Hahtags

Popular Questions