Understanding gradients on Riemannian Manifolds

4.2k Views Asked by At

I am trying to understand how the gradients are defined on a Riemannian manifold, at least in a shallow way. The wikipedia definition is as the following:

enter image description here

What I understand from this is, given a manifold $M$ with the metric $g$, the gradient of function $f$on the manifold $M$ is the vector field $(\nabla f)_x$ which produces for every $x \in M$ and $X_x \in T_xM$ a scalar $(\partial_{X}f)(x)$. According to the definition here, $(\partial_{X}f)(x)$ is equal to the directional derivative of $f$ at the direction of $X$, both evaluated at $x$. So, this should be $\nabla f(x)^TX_x$, if I understand correctly. But this understanding seems trivial; then the required $\nabla f$ will be just the evaluation of the function $f$'s gradient at each $x \in M$. And the last definition of $\partial_{X}f$ with the coordinate chart $\phi$ seems to tell something else; so I am confused here. What is the actual, correct interpretation of this definition?

3

There are 3 best solutions below

1
On

Think of it in terms of properties we want the gradient to have. Properties come first, and formulas are then written out so that they satisfy those properties.

The essential properties of the gradient vector field $\nabla f$ are:

  • $\nabla f$ is perpendicular to the level sets of $f$
  • $|\nabla f|$ is proportional to the rate of increase of $f$

The differential $df$ takes a vector field $X$ and returns a measurement of how quickly $f$ is increasing along the flow line of $X$. If $\nabla f$ has the properties above, then we should be able to obtain this measurement by taking the dot product of $X$ with the gradient field $\nabla f$. So the defining property of the vector field $\nabla f$ is that, for all vector fields $X$, we have: $$ g(\nabla f, X) = df(X) = \partial_Xf $$

3
On

The following is another, equivalent, point of view: the idea that a gradient gives you the direction of "steepest ascent".

The gradient applied to a function $f$ at $p$ should produce a tangent vector that in some sense maximizes the local change in $f$ when walking in the direction of the tangent vector. In order for this to make sense, we have to constrain the length of the vector (otherwise the target of this maximization is unbounded). But this is exactly what the Riemannian metric does, which is a way of specifying at each point a scalar product $\langle v,w\rangle_p=g_p(v,w)$ for any two vectors $v,w$ at $p$, in a way that depends smoothly on the point. We may then define a steepest direction of $f$ at $p$ as the unit vector that maximizes the directional derivative of $f$ at $p$

$$sf_p = \arg \max_{\langle v,v\rangle_p=1} vf= \arg \max_{\langle v,v\rangle_p=1} df_p(v), \tag1$$ and then we define the gradient as a scaled version that also takes into account how fast $f$ actually changes in that direction. We therefore multiply $sf_p$ by the directional derivative of $f$ in the direction of $sf_p$: $$\text{grad} f_p=df_p(sf_p)\;sf_p.$$

It may then be shown that $\text{grad} f_p$ satisfies $$\langle \text{grad} f_p, v \rangle_p = df_p(v) \tag2$$ for every tangent vector $v$. By the non-degeneracy of the scalar product, this then uniquely defines $\text{grad} f_p$ in terms of the inverse metric tensor. In order to show (2), we may write down a Lagrangian for the constrained optimization problem (1): $$\mathcal{L}(v)=df_p(v)+\lambda(\langle v,v\rangle_p-1),$$ where $\lambda$ is some Lagrange multiplier. The first variation of this reads $$\mathcal{L}(v+\delta v)-\mathcal{L}(v)=df_p(\delta v)+2\lambda \langle v,\delta v\rangle_p+\text{higher-order terms} = 0, \quad \forall\delta v,$$ from which we obtain (by setting $v=\delta v=sf_p$) that $-2\lambda=df_p(sf_p)$ and then by plugging this back into the first variation for $v=sf_p$, $$\langle \text{grad} f_p, \delta v \rangle_p=df_p(sf_p)\langle sf_p, \delta v \rangle_p = df_p(\delta v), \quad \forall\delta v.$$

0
On

I will try to convince you that there is no difference with what happens in "normal" calculus. Let's then start from $\mathbb{R}^n$.

In $\mathbb{R}^n$ you have a natural inner product (the Euclidean product) which is given (in the orthonormal basis $\hat x, \hat y, \hat z$) by $\left<a,b\right> = a^t b$. This product defines an isomorphism A: $\mathbb{R}^n \rightarrow (\mathbb{R}^n)^*$ such that $A: a\mapsto \langle a,\cdot \rangle$.

To every function $f:\mathbb{R}^n\rightarrow \mathbb{R}$ you can associate its differential $df$, which is the map $df: v\mapsto D_v(f)$ (directional derivative). Using the isomorphism $A$ given by the eucledian product, you can find a vector (called gradient of f, denoted by $\nabla f$), such that $\left<\nabla f , v\right> = df(v) = D_v(f)$. In the orthonormal basis $\hat x, \hat y, \hat z$, this means $(\nabla f)^k = \frac{\partial f}{\partial x^k} $. This last formula is not true, however, in an arbitrary basis.

Note that the differential of f is the more general object, while its gradient exists thanks to the Euclidean scalar product.

On a Riemaniann manifold, the procedure is exactly the same! The differential is the more general object, defined in exactly the same way: $df(v) = v(f)$ (since $v$ is the generalization of a directional derivative!) and so is the gradient: $g(\nabla f, v) = df(v)$.