I have a statistics/learning problem where I compute the covariance matrix of a random variable $Y \in \mathbb{R}^n$ conditional on different values of a second variable $X \in \mathbb{R}$. We refer to this covariance of $Y$ conditional on $X=X_i$ as $\Sigma(X_i)$. Thus, I have some empirical measurements of pairs $\left( X_i, \Sigma(X_i) \right)$ for some values of $X_i$.
We can also think of these pairs $\left( X, \Sigma(X) \right)$ as a function $\gamma$ that draws a curve in the manifold of $m \times m$ Symmetric Positive Definite Matrices ($\mathbb{S}^+(m)$), with $\gamma : X \to \mathbb{S}^+(m)$.
$\mathbb{S}^+(m)$ can be given many different Riemannian metrics. However, doing some empirical analyses, I find that the curve $\gamma$ is relatively well approximated by the geodesics of the optimal transport metric, better so than any other metrics I've tried (i.e. Affine Invariant metric that is the Fisher Rao metric for 0-centered Gaussians, Log-Euclidean metric, etc).
My question comes from wanting to find an explanation for this observation. That is, I don't have any a priori reasons to think that these changes in $P(Y|X)$ with $X$, that I am measuring empirically, should follow optimal transport trajectories, as opposed to following e.g. Log-Euclidean geodesics, or just being a huge mess. What type of properties of my random variable $Y$, or of the changes could result in these changes following optimal transport trajectories? Any ideas would be very appreciated.
Edit: An important additional detail is that $Y$ is actually obtained by linearly transforming another variable $Z \in \mathbb{R}^k$, with $k > n$. The linear transformation $l: Z \to Y$ is learned with the goal of maximizing the discriminability of the distributions $P(Y|X)$.