Second order derivative of KL divergence will be metric. Can you verify My calculations

50 Views Asked by At

suppose we have a point cloud $X = \{x_1, ..., x_n | x_i ∈ R^D, x_i ≠ x_j \text{ if } i ≠ j\}$, I map this point cloud to PDF using the kernel density estimator. As we know the second order derivative of the KL divergence will give metric and that I am trying to get here.

  1. we have the KL divergence expression for $p(x; X)$ and $q(x; Y)$:

$$D(p(x; X) \Vert q(x; Y)) = \int p(x; X) \log \frac{p(x; X)}{q(x; Y)} , dx$$

  1. Now we will compute the first-order partial derivative of the KL divergence with respect to each element of $X$. Since the logarithm term involves a division, it might be easier to use the chain rule and rewrite the KL divergence as:

$$D(p(x; X) \Vert q(x; Y)) = \int p(x; X) (\log p(x; X) - \log q(x; Y)) , dx$$

Let's compute the first-order partial derivative of the KL divergence with respect to $X^{ij}$:

$\frac{\partial(D(p(x; X) \Vert q(x; Y)))}{\partial X^{ij}} = \int \left[\frac{\partial p(x; X)}{\partial X^{ij}} (\log p(x; X) - \log q(x; Y)) + p(x; X) \frac{\partial(\log p(x; X))}{\partial X^{ij}}\right] dx$

  1. Next, we will compute the second-order partial derivative by differentiating the first-order partial derivatives with respect to each element of $X$ again. For the second-order partial derivative with respect to $X^{kl}$, we have:

$\frac{\partial^2(D(p(x; X) \Vert q(x; Y)))}{\partial X^{ij} \partial X^{kl}} = \int \left[\frac{\partial^2 p(x; X)}{\partial X^{ij} \partial X^{kl}} (\log p(x; X) - \log q(x; Y)) + \frac{\partial p(x; X)}{\partial X^{ij}} \frac{\partial(\log p(x; X))}{\partial X^{kl}} + \frac{\partial p(x; X)}{\partial X^{kl}} \frac{\partial(\log p(x; X))}{\partial X^{ij}} + p(x; X) \frac{\partial^2(\log p(x; X))}{\partial X^{ij} \partial X^{kl}}\right] dx$

The second-order partial derivative of the KL divergence with respect to $X^{ij}$ and $X^{kl}$ will be the $H_{ijkl}(X)$ component of the info-Riemannian metric tensor $H(X)$.

First I want to know if my calculations are correct or not and secondly if it is correct then can we further simplify the above second order derivative expression?

1

There are 1 best solutions below

0
On

You could have simplified the first and second order derivatives using the chain rule on $\frac{\partial \log p}{\partial X^{ij}}=\frac{\partial p}{\partial X^{ij}}\cdot \frac1{p}$, this yields

\begin{align*} \frac{\partial D(p\| q)}{\partial X^{ij}}=\int \left[ \frac{\partial p}{\partial X^{ij}} (\log p-\log q + 1) \right]~ dx \end{align*} and therefore, further differentiation yields \begin{align*} \frac{\partial^2 D(p\| q)}{\partial X^{ij} X^{k\ell}}&=\int \left[ \frac{\partial p}{\partial X^{ij}} (\log p-\log q + 1) \right]~ dx\\ &=\int \left[ \frac{\partial^2 p}{\partial X^{ij}\partial X^{k\ell}} (\log p-\log q + 1) + \frac{\partial p}{\partial X^{ij}}\frac{\partial \log p}{\partial X^{k\ell}} \right]~ dx\\ &=\int \frac{\partial^2 p}{\partial X^{ij}\partial X^{k\ell}} \left(\log \frac pq + 1+\frac1p\right)~ dx\\ \end{align*}

I am not sure what you mean by "the second order derivative of the KL divergence will give metric", maybe this is dependent on the kernel you choose for $p$ and $q$.