Problem: I have two kernel density estimates of distributions, call these $p(\textbf x)=\displaystyle \frac{1}{|X|} \sum_{\textbf x_i \in X} K_H(\textbf{x}-\textbf{x}_i)$ and $q(\textbf x)=\frac{1}{|W \cup V|}\displaystyle \sum_{\textbf x_i \in W} K_H(\textbf{x}-\textbf{x}_i)+\textstyle\frac{1}{|W \cup V|}\displaystyle\sum_{\textbf v_i \in V} K_H(\textbf{x}-\textbf{v}_i)$, which is basically an average of two kernel densities on different sets $W$ and $V$.
I want to show that the KL Divergence between these two, i.e. $\displaystyle \int_{\chi}p(\textbf x) \text{log}\frac{p(\textbf x)}{q(\textbf x)}d\textbf x$, is convex with respect to the set $V$ in q, fixing all the other points.
In this light, we can rewrite $q$ as a function of $\textbf x$ and $V$, $q\left(\textbf x, V\right)$, and the integral becomes $\displaystyle \int_{\chi}p(\textbf x) \text{log}\frac{p(\textbf x)}{q\left(\textbf x, V\right)}d\textbf x$, which implicitly defines a function in terms of $V$, $F(V)$. Then, I need to show that that implicit function is convex.
It could be as simple arguing that since KL divergence is convex, then whatever form $q$ takes it is convex, therefore it is convex with respect to the parameter $V$ that defines $q$. Is that a correct argument?
What about variational calculus? I could describe the problem as minimizing the integral with respect to the function $q(\textbf x, V)$ defined by the set $V$. Then, taking the Euler-Lagrange equation:
$\frac{\delta J}{\delta q}=- \frac{p(\textbf x)}{q(\textbf x,V)}$ and $\frac{\delta^2 J}{\delta q^2}=\frac{p(\textbf x)}{q(\textbf x,V)^2} \geq 0 \space \forall V$ as they are PDFs so always greater than 0, and therefore for any $V$ it is convex (second derivative always greater than 0). Can it be argued this way?
At the end of the day, the reason I need this is to support the argument that if we have a set of candidate points $G$, iteratively picking the point and adding it to $q$ that gives the minimum KL divergence at each step (i.e. constructing a set $V$ greedily by picking the best (min KL divergence) point at each step and updating $q$) will give the overall best set $V \subseteq G$ that minimizes the KL divergence (greedily optimal implies globally optimal).