Amari's Pythagorean theorem

575 Views Asked by At

Recently I started to read Amari's "Information Geometry and Its Applications". I quickly stumbled upon some (apparent) inconsistencies. It starts with the "generalized Pythagorean theorem" on a dually flat manifold $(M, g, \nabla, \nabla')$ with (Bregman) divergence $D$: $$ D(p||r) = D(p||q) + D(q||r). $$ In this book Amari claims that the above equation holds if the $\nabla'$-geodesic $PQ$ and the $\nabla$-geodesic $QR$ are orthogonal at $Q$. This is also his claim in other books and papers. However, when I try his proof I come to the conclusion that it should be the other way around, i.e.: the $\nabla$-geodesic $PQ$ should be orthogonal to the $\nabla'$-geodesic $QR$ (hence it seems as if the "duals" are interchanged). I started looking in the literature and there I find an apparent dichotomy. Half of the papers claim one version and the other half claims the other version (even Amari does not seem to be able to be consistent.)

What makes this even more confusing is the fact that proofs such as the following one from Amari and Cichoki's paper "Information geometry of divergence functions" seem to claim one version but instead prove the other one.

Example proof

Can somebody please show where I messed up during my reasoning? (Or, if not, perhaps explain why the literature is so confusing on this point.)

1

There are 1 best solutions below

4
On BEST ANSWER

This is indeed confusing. My guess is that Eqs. (35) and (36) are mixed up in the paper excerpt and similarly there is a typo in Amari's "Information Geometry and Its Applications". On the other hand, Theorem 3.8 in Amari's "Methods of Information Geometry" seems to be correct. I'm giving a correct version of the theorem below.


Let $(M,g,\nabla,\nabla^*)$ be a dually flat manifold and let $\psi,\varphi$ be the $\nabla$- and $\nabla^*$-potentials, i.e. $\text{Hess}^{\nabla}\psi=\text{Hess}^{\nabla^*}\varphi=g$. Furthermore, let $\theta,\eta$ be $\nabla$- and $\nabla^*$-affine charts respectively. Let $p,q,r\in M$ be points such that $pq$ is a $\nabla$-geodesic and $qr$ is a $\nabla^*$-geodesic that form a right angle at $q$. Then $D(p||r)=D(p||q)+D(q||r)$, where $$ D(x||y)=\psi(x)+\varphi(y)-\theta(x)\cdot\eta(y), \quad x,y\in M,$$ is the canonical $\nabla$-divergence (remark: $D^*(x||y):=D(y||x)$ is the corresponding $\nabla^*$-divergence).

Proof: We have \begin{align} D(p||q)+D(q||r)&=\quad\psi(p)+\varphi(q)-\theta(p)\cdot\eta(q) \\ &\quad +\psi(q)+\varphi(r)-\theta(q)\cdot\eta(r) \\ &=\psi(p)+\varphi(r)-\theta(p)\cdot\eta(r)+\theta(p)\cdot\eta(r) \\ &\quad +\theta(q)\cdot\eta(q)-\theta(p)\cdot\eta(q)-\theta(q)\cdot\eta(r)\\ &=D(p||r)+(\theta(q)-\theta(p))\cdot(\eta(q)-\eta(r)). \end{align} The residual term is just the inner product between the tangent vectors of $pq$ and $qr$ at $q$, respectively, which is zero by assumption. (let me know if this is unclear, I'm happy to elaborate). $\square$