Transformation matrix elements for Gradient

215 Views Asked by At

I am reading "A Student's Guide to Vectors and Tensors" by Daniel Fleisch and I am struggling to understand his reasoning on why the components of the gradient vector transform covariantly. On page 129, using the chain rule with index notation, he shows that the gradient transforms from the unbarred system to the barred system according to

\begin{equation*} \frac{\partial f}{\partial\bar{x}^{i}} = \frac{\partial x^{j}}{\partial\bar{x}^{i}}\frac{\partial f}{\partial x^{j}}. \end{equation*}

Then he says

But in this case the elements of the transformation matrix $\frac{\partial x^{j}}{\partial\bar{x}^{i}}$ are the inverse of those in the transformation of the differential length elements (which are $\frac{\partial\bar{x}^{i}}{\partial x^{j}}$). And just as in that case the $\frac{\partial\bar{x}^{i}}{\partial x^{j}}$ terms represent the components of vectors that point along the original coordinate axes, in this case the $\frac{\partial x^{j}}{\partial\bar{x}^{i}}$ terms represent the components of vectors that are perpendicular to the original coordinate surfaces. Hence in this case the weighting factors are the components of the (contravariant) dual basis vectors, which means that the components of the gradient vector transform as covariant components.

How does he know that the $\frac{\partial x^{j}}{\partial\bar{x}^{i}}$ terms represent the components of vectors that are perpendicular to the original coordinate surfaces?

I am aware that one can write $\frac{\partial}{\partial x^{i}}$ as the basis vectors in the unbarred system, meaning that the $\frac{\partial\bar{x}^{i}}{\partial x^{j}}$ terms are the components of the vectors parallel to the coordinate axes, such that:

\begin{align*} \bar{\mathbf{e}}_{1} &= \frac{\partial\bar{x}^{1}}{\partial x^{1}}\mathbf{e}_{1} + \frac{\partial\bar{x}^{2}}{\partial x^{1}}\mathbf{e}_{2} \\ \bar{\mathbf{e}}_{2} &= \frac{\partial\bar{x}^{1}}{\partial x^{2}}\mathbf{e}_{1} + \frac{\partial\bar{x}^{2}}{\partial x^{2}}\mathbf{e}_{2} \end{align*}

in the case of two dimensions. So, following this reasoning, would I then construct a similar pair of equations with the $\frac{\partial x^{j}}{\partial\bar{x}^{i}}$ terms and the dual basis vectors $\mathbf{e}^{1}$ and $\mathbf{e}^{2}$?

3

There are 3 best solutions below

2
On BEST ANSWER

$ \newcommand\PD[2]{\frac{\partial#1}{\partial#2}} $

Vectors are geometric entities independent of any coordinatization. Hence it makes sense to define coordinate functions which map a vector $v$ to it's coordinates, $x^1(v), x^2(v), \dotsc$. We also have the inverse, where given coordinate values we can determine a vector $v(x^1,x^2,\dotsc)$, and this allows us to define the basis vectors associated with the coordinate system as $e_i = \partial v(x^1,x^2,\dotsc)/\partial x^i$. Given an inner product, there is also a unique reciprocal basis $\{e^i\}$ defined such that $e^i\cdot e_j = \delta^i_j$. By this definition it is evident that $$ \sum_j e_j(e^j\cdot v) = v = \sum_j e^j(e_j\cdot v). $$

The gradient with respect to $v$ is then essentially defined by $\nabla_v x^i(v) = e^i$. We can always, in any coordinate system, express the gradient as $$ \nabla_v = e^1\PD{}{x^1} + e^2\PD{}{x^2} + \cdots = e_1\PD{}{x_1} + e_2\PD{}{x_2} + \cdots. $$ It's worth noting that when $\{e_i\}$ is an orthonormal basis then it is its own reciprocal $e^i = e_i$.

Here's an example; in particular we see it isn't necessarily true that $x^i = e^i\cdot v$. (Though this is not the most illustrative example, since the resulting basis is orthogonal.) In 2D polar coordinates, $$ v(r,\theta) = (r\cos\theta)e_1 + (r\sin\theta)e_2, $$$$ r(v) = |v|,\quad \theta(v) = \arctan\frac{y}{x} $$ where $\{e_1,e_2\}$ is the standard basis, we are using the standard inner product, and $x := v\cdot e_1$ and $y := v\cdot e_2$. Then the basis vectors are $$ e_r = \PD{v}{r} = (\cos\theta)e_1 + (\sin\theta)e_2, $$$$ e_\theta = -(r\sin\theta)e_1 + (r\cos\theta)e_2, $$$$ e^r = \nabla_v r(v) = \frac v{|v|} = e_r, $$$$ e^\theta = \nabla_v \theta_v = \frac{x^2}{x^2 + y^2}\left(-\frac{y}{x^2}e_1 + \frac1{x}e_2\right). $$ We can find that $e^\theta$ simplifies to $$ e^\theta = \frac1r[-(\sin\theta)e_1 + (\cos\theta)e_2] = \frac1{r^2}e_\theta, $$ and consequently $$ e^\theta\cdot v = e^\theta\cdot(re_r) = 0. $$ However, these basis vectors depend on the coordinates. If we instead look at $v(r',\theta')$, we get $$ e^\theta\cdot v(r',\theta') = (e_\theta/r)\cdot e_{r'} = (-\sin\theta)(\cos\theta') + (\cos\theta)(\sin\theta') = \sin(\theta'-\theta). $$

Now recall that the gradient of a function is perpendicular to its level curves. Coordinate surfaces are just the level curves of the coordinate functions $x^i(v)$, i.e. the set of all $v$ such that $x^i(v) = c$ for some constant $c$, and as just discussed $\nabla_v x^i(v) = e^i$, so the reciprocal vectors are perpendicular to the corresponding coordinates curves. All that is left, then, is to see what can give us $\{e^i\}$.

Consider a new set of coordinates $\{\bar x^i\}$ with basis $\{\bar e_i\}$. By the chain rule $$ e_i = \PD v{x^i} = \sum_j \PD{\bar x^j}{x^i}\PD v{\bar x^j} = \sum_j \PD{\bar x^j}{x^i}\bar e_j, $$ Which establishes $\partial\bar x^j/\partial x^j$ as the transformation matrix between $e_i$ and $\bar e_j$. We then consider that $$ e^i = \nabla_v x^i(v) = \sum_j \bar e^j(\bar e_j\cdot\nabla_v)x^i(v) = \sum_j \bar e^j\PD{x^i}{\bar x^j}. $$ This is what Fleisch is talking about. $e^i$ is perpendicular to the $x^i$ coordinate surface, and hence the quantities $\partial{x^i}/\partial{\bar x^j}$ (the inverse matrix of $\partial{\bar x^i}/\partial{x^j}$) are the coordinates in the $\bar e^j$ basis of vectors perpendicular to the original coordinate surfaces. If we want something using the $\{\bar e_i\}$ basis, then instead we would get $$ e^i = \nabla_v x^i(v) = \sum_j \bar e_j(\bar e^j\cdot\nabla_v)x^i(v). $$

3
On

The gradient is covariant because it's defined to be: $\nabla f$ is the unique vector satisfying

$$\langle \nabla f(x), v\rangle = D_v f(x)$$ for every vector $v$, where the RHS is the (coordinate-independent, scalar) directional derivative of $f$ in the $v$ direction. Since the RHS is unchanged by a change of coordinates, so must the LHS.

I don't really understand the quoted paragraph either. Certainly you can write $$\mathbf{e}_1 = \frac{\partial x^1}{\partial \bar{x}^1} \bar{\mathbf{e}}_1 + \frac{\partial x^2}{\partial \bar{x}^1} \bar{\mathbf{e}}_2,$$ and more generally, if $A_{ij} = \frac{\partial x^i}{\partial \bar{x}^j}$ and $B_{ij} = \frac{\partial \bar{x}^i}{\partial x^j}$ are the coordinate transformation matrices, then $A = B^{-1}$.

But it's not necessarily true that $\langle \mathbf{e}_1, \bar{\mathbf{e}}_1\rangle = 0$...

0
On

Let $(M, g)$ be a smooth $n$-dimensional Riemannian manifold. As another answer said, the gradient vector transforms covariantly, i.e. as a tangent vector, because of it's defining property, which is that for any $p \in M$, the gradient vector $\nabla f(p) \in T_pM$ is defined to be the unique vector such that for all $v \in T_pM$, $\langle v, \nabla f(p) \rangle_{g(p)} = df(p)(v)$, where $df$ is the covector given in coordinates by $df(x) = \sum_{j = 1}^{n}\frac{\partial f}{\partial x^j}\,dx^j$. This says that $\nabla f(p)$ is obtained from $df(p)$ via the Riesz isomorphism induced by the inner product $g(p)$ on $T_pM$ (also known as "raising an index"). So $\nabla f$ is manifestly a vector field. In terms of coordinates, this means that if we change coordinates from $x$ to $y$ by $x = \phi(y)$, then the components of $V = \nabla f$ transform from $V^x(x)$ in the $x$ coordinates to $V^y(y)$ in the $y$-coordinates via the formula $V^x(x) = D\phi(y)V^y(y)$, which means $V^y(y) = D\phi(y)^{-1}V^x(x)$.

You can also verify that $\nabla f$ transforms as a tangent vector directly. First, in $x$-coordinates, the defining property of $\nabla f$ reads $\langle v, G(x)\nabla f(x)\rangle = \langle v, df(x)\rangle$ for all $v \in \mathbb{R}^n$. Hence in coordinates, $\nabla f(x) = G(x)^{-1}df(x)$. When transforming coordinates from $x$ to $y$ by $x = \phi(y)$, we must have $\langle u, G^y(y)v \rangle = \langle D\phi(y)u, G^x(x)D\phi(y)v \rangle$ for all $u, v \in \mathbb{R}^n$, which means $G^y(y) = D\phi(y)^TG^x(x)D\phi(y)$. This leads to the formula $V^y(y) = D\phi(y)^{-1}V^x(x)$ again.