Intuitive difference between optimal transport distance and Fisher information distance

169 Views Asked by At

Let me start by saying I'm not a mathematician but a biologist with an interest in mathematics.

I have a set of covariance matrices and I am interested in studying their geometry in the Symmetric Positive Definite Matrices (SPDM) manifold. I'm particularly interested in relating the geometry to statistical properties of the centered Gaussians defined by those matrices (e.g. how 'discriminable' the data generated by the different Gaussians is).

There exist a plethora of metrics that are used with this manifold. Two of these metrics (at least) are related to probabilistic/statistics concepts. One is the Affine Invariant (AI) distance, which according to this source *Up to a constant, it is known as the Fisher information metric". The other one is the Wasserstein distance, which is the optimal transport distance.

The AI distance between $A$ and $B$ is given by: $$d(A,B) = ||log(A^{-1/2} B A^{-1/2})||_F$$

The Wasserstein distance is given by: $$d(A,B) = tr(A) + tr(B) - 2tr((A^{1/2} B A^{1/2})^{1/2})$$

These two metrics behave very differently. For one, it is my understanding that the AI manifold has negative curvature, while the Wasserstein manifold has non-negative curvature. But I'm still not clear on the differences between the two from a probability/statistical conceptually point of view. What do the two distances tell me with regards to the statistical problem of discriminating between classes that generate data according to the Gaussians of those SPDM? What complementary information about this problem do the two metrics give? This gives some intuition of the difference between KL divergence and Optimal transport distance, but I'm not sure how much translates to the AI/Fisher information metric.