Minimize KL divergence + linear function

697 Views Asked by At

I am looking at the following problem: $$ \min_q \underbrace{KL \left[ q(x) ~||~ p(x) \right]}_{=: A} + \underbrace{\mathbb{E}_{x \sim q} \left [ f(x) \right ]}_{=: B}. $$ For $A$, the solution is $q(x) = p(x)$. For $B$, the solution is a point mass/dirac/delta distribution putting all its mass at $arg,\min f(x)$. Further I know that $A$ should be strictly convex (accd to a comment in this question .)

My questions are the following:

Is there any hope of finding the minimizer in closed form, making use of the two respective solutions? Something like "the minimizer lies on a line between the two respective solutions." ?

2

There are 2 best solutions below

5
On BEST ANSWER

This is a case of a variational problem with a subsidiary condition (constraint). The objective is to minimize the functional $$ J[q]\triangleq \int_x F(q) dx, $$

where $F(q) \triangleq q(x)\left(\log \frac{q(x)}{p(x)} + f(x)\right)$, under the condition $$ \tag{1} \int_x q(x)dx =1. $$

From calculus of variations, a necessary condition for the function $q(x)$ to be an extremal of $J[q]$ is

$$ \tag{2} \frac{\partial}{\partial q} F + \lambda =0, $$ for some constant $\lambda$. Solving the differential equation of (2) with respect to $q(x)$ gives

$$ q(x) = p(x) e^{-f(x) - \lambda -1}. $$

Plugging this into (1), shows that $\lambda= \log \mathbb{E}_{x\sim p}\left(e^{-f(x)}\right) - 1$, finally resulting in

$$ q(x) = \frac{1}{\mathbb{E}_{x\sim p}\left(e^{-f(x)}\right)} p(x) e^{-f(x)}. $$

edit1: corrected link.

edit 2: fixed algebraic error, identified by @Artemy.

0
On

There is also a way to derive the answer without calculus of variations. First, rewrite \begin{align} KL(q(X) \Vert p(X)) + \mathbb{E}_q[f(X)] &= \sum_x q(x) \ln \frac{q(x)}{p(x)} + \sum_x q(x) f(x) \\ & = \sum_x q(x) \ln \frac{q(x)}{p(x)e^{-f(x)}} \tag{1} \end{align}

Now define the probability distribution $w(x) := \frac{1}{Z} p(x)e^{-f(x)}$, where $Z=\sum_x p(x)e^{-f(x)}$ is the normalization constant. We can then rewrite $(1)$ as \begin{align} \sum_x q(x) \ln \frac{q(x)}{w(x)Z} = KL(q(X)\Vert w(X))-\ln Z\tag{2} \end{align} Note that $\ln Z$ is a constant that doesn't depend on $q$. Thus, Eq. $(2)$ is minimized by taking $q(x)=w(x)$, at which point it reaches its minimum value of $-\ln Z$.