Variational inference: Does the natural gradient follow geodesics locally?

Question

Variational inference: Does the natural gradient follow geodesics locally?

621 Views Asked by Bumbble Comm At 09 Apr 2026 - 5:33

Amari's natural gradient descent is a well-known optimisation algorithm from information geometry that is well-suited for finding optima of functionals on statistical manifolds. It consists of preconditioning the gradient descent update rule with the inverse of the Fisher information metric tensor.

The objective function I am concerned with is the variational free-energy, or evidence lower bound, in the context of approximate Bayesian inference (i.e. variational Bayes). The metric on the statistical manifold is the Fisher information metric. Also, the particular problem I am working on is categorical inference -- the statistical manifold is the standard simplex and the information length is proportional to the Euclidean metric after a suitable change of coordinates (diffeomorphism onto a quadrant of sphere) -- this might be a special case where my question holds true, but I would also like to know the answer in general.

My intuition for the natural gradient is that it follows (locally) the direction of greatest change of the objective function in information space -- just like normal gradient descent follows the direction of greatest change in Euclidean space. This leads me to ask whether, at the limit where the step-size tends to zero, natural gradient descent follows geodesics locally on the manifold, according to the Fisher information metric.

If this is the case, could you explain how? If not, could you explain why not to help me improve my understanding?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 28 Jul 2019 - 11:17

No, in general this is not true, although for a surface like a surface of revolution (with $z$-axis symmetry) it will hold. Indeed, you can design surfaces so that following curves of steepest ascent will spiral an arbitrarily long distances to get to the top of a mountain.

See Exercise 28 on p. 78 of my differential geometry text or the article "When Does Water Find the Shortest Path Downhill? The Geometry of Steepest Descent Curves," in The American Mathematical Monthly, December, 2003.

**Bumbble Comm** · Accepted Answer

In order for gradient flow lines on a riemannian manifold $(M,g)$ to follow geodesics, the gradient field $\text{grad}\varphi$ has to be proportional to a vector field which is constant along itself, i.e. there must be a positive function $\sigma$ such that $X=\sigma\cdot\text{grad}\varphi$ satisfies $\nabla_X X=0$. The functions $\varphi$ for which this is true are very rare.

Regarding the specific question about the "natural gradient" flow: on a statistical manifold the geodesics with respect to exponential and mixture connections (the dually flat connections in information geometry) are more fundamental than the geodesics with respect to the Levi-Civita connection. It turns out that the gradient flow of the KL divergence (dual KL divergence) is a time-changed exponential (mixture) geodesic, respectively, and this comes down to the fundamental importance of these divergences and their relation with the corresponding connections.

For further intuition on steepest descent and gradients vs. differentials in general, please also take a look at this related CrossValidated answer.

Variational inference: Does the natural gradient follow geodesics locally?

There are 2 best solutions below

Related Questions in DIFFERENTIAL-GEOMETRY

Related Questions in STATISTICAL-INFERENCE

Related Questions in CALCULUS-OF-VARIATIONS

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in INFORMATION-GEOMETRY

Trending Questions

Popular # Hahtags

Popular Questions