I would like to
- evaluate the Kullback-Leibler divergence between a target distribution $f(x)$ and a smoothed version $g(x)=\int K_h(x-x')f(x') dx'$, where $K_h$ is a symmetric kernel with width $h$.
- minimize the KL divergence with respect to the smoothing kernel $K$ using variational calculus subject to $h>0$ (using the delta function would be a trivial solution with zero width).
The motivation is to find the functional form of the smoothing kernel (e.g., top-hat, Gaussian, Epanechnikov) that makes $g(x)$ converge to $f(x)$ as fast as possible as we reduce the width $h$ (I appreciate this is not a very precise statement). I would like to avoid comparing a finite set of kernels because the best kernel may not be in my candidate set.
What I've tried...
The explicit form of the KL divergence is $$\begin{aligned} D_{KL}\left(f\Vert g\right) &= \int f(x)\left(\log\frac{f(x)}{g(x)}\right) dx\\ &= \int f(x)\log\left(\frac{f(x)}{\int K_h(x-x')f(x') dx'}\right) dx. \end{aligned}$$ We can expand the log and drop the $f(x)\log f(x)$ term because it does not depend on the kernel. We now seek to minimize $$ -\int f(x) \log \left(\int K_h(x-x') f(x') dx'\right)dx $$ The standard Euler-Lagrange equations don't seem to apply here due to the nested integral inside the $\log$. I've contemplated going through the steps of adding a variational function and deriving a variant of the EL equations for this problem. But I thought I'd check whether this is a standard problem before opening that can of worms.
I am also happy to consider $D_{KL}\left(g\Vert f\right)$ if that simplifies things.
Let's rename the "problematic" expression as $$ S[k] := \int\mathrm{d}x\ f(x)\ln\left(\int\mathrm{d}y\ f(y)k(x-y)\right) $$ for the sake of readability. In order to find the stationary "points" of this functional, one has to to set its functional derivative to zero, whose definition in (purely) mathematical books with a little "pertubation" term is usually impractical, that is why it will be preferable to use $$ \frac{\delta}{\delta k} \equiv \frac{\partial}{\partial k} - \frac{\mathrm{d}}{\mathrm{d}x} \frac{\partial}{\partial k'} + \frac{\mathrm{d}}{\mathrm{d}x^2} \frac{\partial}{\partial k''} - \frac{\mathrm{d}}{\mathrm{d}x^3} \frac{\partial}{\partial k'''} + \ldots, $$ which is nothing else than a generalized Euler-Lagrange operator. In the present case, it is simplified to $\frac{\delta}{\delta k} \equiv \frac{\partial}{\partial k}$. Moreover, it is to be noted that $\frac{\delta k(x)}{\delta k(x')} = \delta(x-x')$, where the last $\delta$ represents a Dirac delta. One has then : $$ \begin{array}{rcl} \displaystyle \frac{\delta S[k]}{\delta k(z)} &=& \displaystyle \int\mathrm{d}x\ f(x)\frac{\int\mathrm{d}y\ f(y)\delta(x-y-z)}{\int\mathrm{d}y\ f(y)k(x-y)} \\ &=& \displaystyle \int\mathrm{d}x\ \frac{f(x)f(x-z)}{\int\mathrm{d}y\ f(y)K(x-y)} \\ \end{array} $$ The resulting Euler-Lagrange equation may still be quite hard to solve nonetheless.