Relationship between symmetrized KL-divergence and geodesics

263 Views Asked by At

I am currently working through Amari's Information Geometry and its Applications and in chapter 3, theorem 3.2 states that for distributions $p, q$ on discrete symbols, the following relationship holds: $$\frac12\left(D_{KL}[p:q] + D_{KL}[q:p]\right) = \int_0^1 g_e(t)\,\mathrm dt = \int_0^1 g_m(t)\,\mathrm dt$$ where $$g_e(t) = g_{ij}\dot\xi_e^i(t)\dot\xi_e^j(t)$$ $$g_m(t) = g_{ij}\dot\xi_m^i(t)\dot\xi_m^j(t)$$ $$\xi_e(t) = \exp\{(1-t)\log p + t log q - \psi(t)\}$$$$\xi_m(t) = (1-t)p + tq$$ $$\psi(t)= \log \sum_i \exp\{(1-t)\log p_i + t log q_i\}$$ and $g_{ij}$ are the components of the Fisher information matrix. In words, $\xi_e$, $\xi_m$ are geodesic curves from $p$ to $q$ along the dually flat geometry on spaces of distributions introduced in the previous chapter, $g_e, g_m$ are respectively the second order expansions $D_{KL}[\xi_e(t):\xi_e(t+dt)]$, $D_{KL}[\xi_m(t+dt):\xi_e(t)]$ at $t$.

After stating this theorem, Amari writes that "The proof is technical and is omitted" without any references. I would like to get some intuition for the main ideal behind the proof and am looking either for a direct explanation or some links to what references I would check to find a proof for myself.

1

There are 1 best solutions below

0
On

First step is to observe that since $\sum\dot\xi_e(t)=0$, we have $$\dot\psi(t)=\sum\xi_e(t)\log\frac{q}{p}.$$ Then we observe (e.g. by expanding the KL-divergences as you suggest) that \begin{align} g_e(t)=\sum \left(\frac{\dot\xi_e(t)}{\xi_e(t)}\right)^2\xi_e(t)=\sum \left(\log\frac{q}{p}-\dot\psi(t)\right)^2\xi_e(t)=\ddot\psi(t). \end{align} This implies that $$\int_0^1g_e(t)dt=\dot\psi(1)-\dot\psi(0)=\sum (q-p)\log\frac{q}{p}=D_{KL}(q||p)+D_{KL}(p||q).$$ (a factor of 1/2 seems to be missing from Amari's claim).

For the second claim, observe that \begin{align} g_m(t)=\sum \left(\frac{\dot\xi_m(t)}{\xi_m(t)}\right)^2\xi_m(t)=\sum \frac{(q-p)^2}{(1-t)p+tq}. \end{align} Hence by integration $$\int_0^1g_m(t)dt=\sum (q-p)^2\int_0^1\frac{dt}{p+(q-p)t}=\sum (q-p)\log\frac{q}{p}=D_{KL}(q||p)+D_{KL}(p||q).$$