suppose we have a point cloud $X = \{x_1, ..., x_n | x_i ∈ R^D, x_i ≠ x_j \text{ if } i ≠ j\}$, I map this point cloud to PDF using the kernel density estimator. As we know the second order derivative of the KL divergence will give metric and that I am trying to get here.
- we have the KL divergence expression for $p(x; X)$ and $q(x; Y)$:
$$D(p(x; X) \Vert q(x; Y)) = \int p(x; X) \log \frac{p(x; X)}{q(x; Y)} , dx$$
- Now we will compute the first-order partial derivative of the KL divergence with respect to each element of $X$. Since the logarithm term involves a division, it might be easier to use the chain rule and rewrite the KL divergence as:
$$D(p(x; X) \Vert q(x; Y)) = \int p(x; X) (\log p(x; X) - \log q(x; Y)) , dx$$
Let's compute the first-order partial derivative of the KL divergence with respect to $X^{ij}$:
$\frac{\partial(D(p(x; X) \Vert q(x; Y)))}{\partial X^{ij}} = \int \left[\frac{\partial p(x; X)}{\partial X^{ij}} (\log p(x; X) - \log q(x; Y)) + p(x; X) \frac{\partial(\log p(x; X))}{\partial X^{ij}}\right] dx$
- Next, we will compute the second-order partial derivative by differentiating the first-order partial derivatives with respect to each element of $X$ again. For the second-order partial derivative with respect to $X^{kl}$, we have:
$\frac{\partial^2(D(p(x; X) \Vert q(x; Y)))}{\partial X^{ij} \partial X^{kl}} = \int \left[\frac{\partial^2 p(x; X)}{\partial X^{ij} \partial X^{kl}} (\log p(x; X) - \log q(x; Y)) + \frac{\partial p(x; X)}{\partial X^{ij}} \frac{\partial(\log p(x; X))}{\partial X^{kl}} + \frac{\partial p(x; X)}{\partial X^{kl}} \frac{\partial(\log p(x; X))}{\partial X^{ij}} + p(x; X) \frac{\partial^2(\log p(x; X))}{\partial X^{ij} \partial X^{kl}}\right] dx$
The second-order partial derivative of the KL divergence with respect to $X^{ij}$ and $X^{kl}$ will be the $H_{ijkl}(X)$ component of the info-Riemannian metric tensor $H(X)$.
First I want to know if my calculations are correct or not and secondly if it is correct then can we further simplify the above second order derivative expression?
You could have simplified the first and second order derivatives using the chain rule on $\frac{\partial \log p}{\partial X^{ij}}=\frac{\partial p}{\partial X^{ij}}\cdot \frac1{p}$, this yields
\begin{align*} \frac{\partial D(p\| q)}{\partial X^{ij}}=\int \left[ \frac{\partial p}{\partial X^{ij}} (\log p-\log q + 1) \right]~ dx \end{align*} and therefore, further differentiation yields \begin{align*} \frac{\partial^2 D(p\| q)}{\partial X^{ij} X^{k\ell}}&=\int \left[ \frac{\partial p}{\partial X^{ij}} (\log p-\log q + 1) \right]~ dx\\ &=\int \left[ \frac{\partial^2 p}{\partial X^{ij}\partial X^{k\ell}} (\log p-\log q + 1) + \frac{\partial p}{\partial X^{ij}}\frac{\partial \log p}{\partial X^{k\ell}} \right]~ dx\\ &=\int \frac{\partial^2 p}{\partial X^{ij}\partial X^{k\ell}} \left(\log \frac pq + 1+\frac1p\right)~ dx\\ \end{align*}
I am not sure what you mean by "the second order derivative of the KL divergence will give metric", maybe this is dependent on the kernel you choose for $p$ and $q$.