Why doesn't the chain rule make the second partials of the coordinate transformation functions vanish?

116 Views Asked by At

The following is from Bergmann's Introduction to the Theory of Relativity. An image of the original text is included in case I screwed up the transcription.

A definition of parallel displacement is actually possible in a comparatively simple way. Of course the value of the displaced vector depends on the original vector itself and on the direction of displacement. Let us first consider a Euclidean space, where we can introduce a Cartesian coordinate system. With respect to such a coordinate system, the law of parallel displacement takes the form $$ \delta a_{i}\equiv a_{i,k}\delta x_{k}=0\text{, (5.76)} $$ where $\delta x_{k}$represents the infinitesimal displacement. Let us now introduce an arbitrary coordinate transformation (6.46) $\left[\xi^{i}=f^{i}\left(x_{1},\dots,x_{n}\right),i=1\dots n\right]$. The vector components with respect to that new coordinate system may be denoted by a prime. Then we have \begin{align*} a_{i}= & \frac{\partial\xi^{r}}{\partial x_{i}}a_{r}^{\prime},\\ \frac{\partial a_{i}}{\partial x_{k}}= & \frac{\partial\xi^{s}}{\partial x_{k}}\frac{\partial}{\partial\xi^{s}}\left(\frac{\partial\xi^{r}}{\partial x_{i}}a_{r}^{\prime}\right)\\ = & \frac{\partial\xi^{s}}{\partial x_{k}}\frac{\partial\xi^{r}}{\partial x_{i}}\frac{\partial a_{r}^{\prime}}{\partial\xi^{s}}+\frac{\partial\xi^{s}}{\partial x_{k}}\frac{\partial x_{l}}{\partial\xi^{s}}\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}a_{r}^{\prime}. \end{align*} The $\delta x_{k}$ transforms according to eq. (5.48) $\left[dx_{k}=\frac{\partial x_{k}}{\partial\xi^{i}}d\xi^{i}\right]$, and we obtain $$ 0=a_{i,k}\delta x_{k}=\left\{ \frac{\partial\xi^{r}}{\partial x_{i}}\frac{\partial a_{r}^{\prime}}{\partial\xi^{s}}+\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}\frac{\partial x_{l}}{\partial\xi^{s}}a_{r}^{\prime}\right\} \delta\xi^{s}\text{. (5.76a)} $$ $\frac{\partial a_{r}^{\prime}}{\partial\xi^{s}}\delta\xi^{s}$ is the actual increment of $a_{r}^{\prime}$ as a result of the displacement, and shall be denoted $\delta a_{r}^{\prime}.$ Multiplying the right-hand side of eq. (5.76a) by $\frac{\partial x_{i}}{\partial\xi^{t}},$ we get finally $$ \delta a_{t}^{\prime}=\frac{\partial x_{i}}{\partial\xi^{t}}\frac{\partial x_{l}}{\partial\xi^{s}}\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}a_{r}^{\prime}\delta\xi^{s}\text{. (5.77)} $$

The final expression looks like a simple application of the chain rule which would contract to give

$$0=\frac{\partial x_{i}}{\partial\xi^{t}}\frac{\partial x_{l}}{\partial\xi^{s}}\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}=\frac{\partial^{2}\xi^{r}}{\partial\xi^{t}\partial\xi^{s}}.$$

I typically explain away this kind of situation by using a parallel transported tangent plane and claim the $\xi^r$ being differentiated is a coordinate in the tangent plane, whereas the differentiating coordinates are in the manifold. But that's pretty hand-wavy, and doesn't have any merit in this circumstance since I can't parallel transport a tangent plane until parallel transportation has been defined. And that is the purpose of the entire derivation.

How does one explain why the above attempt to apply the chain rule is invalid?

enter image description here

2

There are 2 best solutions below

7
On BEST ANSWER

Why do you think $\frac{\partial x_{i}}{\partial\xi^{t}}\frac{\partial x_{l}}{\partial\xi^{s}}\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}=\frac{\partial^{2}\xi^{r}}{\partial\xi^{t}\partial\xi^{s}}$? What you can say is \begin{align} \frac{\partial x_{i}}{\partial\xi^{t}}\frac{\partial x_{l}}{\partial\xi^{s}}\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}&=\frac{\partial x_i}{\partial \xi^t}\frac{\partial x_l}{\partial\xi^s}\frac{\partial^2\xi^r}{\partial x_l\partial x_i}\tag{Scharz's rule}\\ &=\frac{\partial x_i}{\partial \xi^t}\frac{\partial}{\partial \xi^s}\left(\frac{\partial\xi^r}{\partial x_i}\right). \end{align} But now what? You definitely cannot commute the $\frac{\partial}{\partial \xi^s}$ with $\frac{\partial}{\partial x_i}$. The stuff in the last two bullet points are completely different expressions and there's no reason to expect equality in general. Schwarz's rule for symmetry of mixed partials only holds for when you're working in the same coordinate system.

Even more precisely, you should think of being given a function $f:M\to\Bbb{R}$ on a smooth manifold, and two charts $(U,\alpha=(x_1,\dots, x_n))$ and $(V,\beta=(\xi^1,\dots, \xi^n))$. The symbol $\frac{\partial f}{\partial x_i}$ stands for the function$[\partial_i(f\circ\alpha^{-1})]\circ\alpha:U\to\Bbb{R}$, i.e you consider the chart representative $f\circ\alpha^{-1}$, take its $i^{th}$ partial derivative (which is then a function $\alpha[U]\subset\Bbb{R}^n\to\Bbb{R}$) and then pull it back to a function on $U\subset M$ by composing with $\alpha$. The reason Schwarz's rule doesn't apply when you mix charts is because you don't just have one function! There are two different functions involved in the game: $f\circ \alpha^{-1}$ (the local representative with respect to one chart) and $f\circ\beta^{-1}$ (the local representative with respect to the other chart). Even more explicitly:

  • $\frac{\partial f}{\partial x_i}:=[\partial_i(f\circ \alpha^{-1})]\circ\alpha$.
  • Similarly, $\frac{\partial f}{\partial \xi^s}:=[\partial_s(f\circ \beta^{-1})]\circ\beta$.
  • So, iterating the above definitions, $\frac{\partial}{\partial \xi^s}\frac{\partial f}{\partial x_i}:=\partial_s\left[\frac{\partial f}{\partial x_i}\circ \beta^{-1}\right]\circ\beta=\partial_s[[\partial_i(f\circ\alpha^{-1})]\circ (\alpha\circ\beta^{-1})]\circ\beta$.
  • But if you want to switch the roles, then $\frac{\partial}{\partial x_i}\frac{\partial f}{\partial \xi^s}:=\partial_i\left[\frac{\partial f}{\partial \xi^s}\circ \alpha^{-1}\right]\circ\alpha=\partial_i[[\partial_s(f\circ\beta^{-1})]\circ (\beta\circ\alpha^{-1})]\circ\alpha$.

So clearly, you're not just doing $\partial_i\partial_sg=\partial_s\partial_ig$ for a single function $g$. There are different functions involved which is why Schwarz's rule doesn't work in different coordinate systems.

0
On

I've already accepted an answer which is superior to the following ramble. Nonetheless, I feel obliged to share my hand-waving tangent plane argument I eluded to. The following is why we can't ignore the order of differentiation when taking the second derivative of a coordinate transformation. Even though I'm really only using one coordinate system throughout.


But first, let me express the lesson which I took from the accepted answer. The notation here is at the other extreme of verbosity from that of Bergmann. I took it from Ciufolini and Wheeler's Gravitation and Inertia. I'm not advocating its adoption.

Informally, the lesson is that we can't simply redecorate indices appearing in a second (or higher) order partial derivative using the chain. This is because we can't simply permute second derivatives with mixed index decoration because that would ignore the differentiation of the coordinate transformation implied by the notation. That is:

\begin{align} \partial_{sr}^{\bar{i}}\partial_{\bar{s}}^{s}\partial_{\bar{r}}^{r}= \partial_{s\bar{r}}^{\bar{i}}\partial_{\bar{s}}^{s}=\partial_{\bar{r}}\partial_{s}^{\bar{i}}\partial_{\bar{s}}^{s}=\partial_{\bar{r}}\left[\partial_{s}^{\bar{i}}\right]\partial_{\bar{s}}^{s}\left\{\text{N.B} \ne\partial_{\bar{r}}\left[\partial_{s}^{\bar{i}}\partial_{\bar{s}}^{s}\right]\right\}\\ =\partial_{rs}^{\bar{i}}\partial_{\bar{r}}^{r}\partial_{\bar{s}}^{s}= \partial_{r\bar{s}}^{\bar{i}}\partial_{\bar{r}}^{r}=\partial_{\bar{s}}\partial_{r}^{\bar{i}}\partial_{\bar{r}}^{r}=\partial_{\bar{s}}\left[\partial_{r}^{\bar{i}}\right]\partial_{\bar{r}}^{r}\left\{\text{N.B} \ne\partial_{\bar{s}}\left[\partial_{r}^{\bar{i}}\partial_{\bar{r}}^{r}\right]\right\} \end{align} This is the same thing in both notational forms, and with christmas tree lights. Flipping the index ordering between notations makes my head spin. It has been the source of many errors.

enter image description here

I believe the non-equalities will be equalities if both coordinate systems are affine. That is, there is no relative curvature. I also believe this to be the reason second covariant derivatives do not generally commmute.


On an idealized spherical Earth let there be a regional coordinate grid formed of geodesic coordinate lines which form squares near the origin $\mathscr{P}_{o},$ with one coordinate line running due east at that location. Along the grid lines crossing $\mathscr{P}_{o}$ all other grid lines cross at right angles and have the same physical spacing as the first grid lines. These intersecting lines are assigned integer coordinate values, increasing to the east and to the north, beginning with zero at the origin. Grid coordinates are written as $\left\{ x^{1},x^{2}\right\} ,$ with the first giving the easterly coordinate displacement, and the second the northerly coordinate displacement.

Symbolize the directed line segments from $\mathscr{P}_{00}=\mathscr{P}_{o}$ to $\mathscr{P}_{10}$ and $\mathscr{P}_{01}$ respectively as $\hat{\mathfrak{e}}_{\underline{1}}$ and $\hat{\mathfrak{e}}_{\underline{2}},$ where the subscripts on the $\mathscr{P}_{\mu\nu}$ indicate the integer grid coordinates of the physical locations where grid lines meet. The underlines on the reference basis vectors indicate their unique property of serving locally as a basis for grid coordinates. That is, all points near $\mathscr{P}_{o}$ can be located using grid coordinates as scalar coefficients in linear combinations of the reference basis vectors. When grid coordinates are used as vector components expressed on $\left\{ \hat{\mathfrak{e}}_{\underline{i}}\right\} $ their indices will also be underlined. So we write the position vector of the point having grid coordinates $\left\{ x^{i}\right\} $ as $\hat{\mathfrak{e}}_{\underline{i}}x^{\underline{i}},$ again to emphasize that these displacements are small enough to use a Euclidean plane approximation.

We shall call the surface parts bounded by grid lines with unit coordinate separation "grid squares" even though they are not in general ideal squares. The length of $\hat{\mathfrak{e}}_{\underline{1}}$ is designated the unit of physical length for the entire region. This length is by definition equal to that of $\hat{\mathfrak{e}}_{\underline{2}}.$

Each grid square is identified by its southwest corner $\mathscr{P}_{\mu\nu},$ and has a pair of basis vectors $\left\{ \mathfrak{e}_{1},\mathfrak{e}_{2}\right\} _{\mu\nu}$ coinciding with the bounding edges meeting at $\mathscr{P}_{\mu\nu}.$ Their symbols lack both the underlined indices and the hats because they do not serve as a basis for the grid, even locally, and they are not in general of unit physical length, nor do they generally meet at right angles.

The $\left\{ \mathfrak{e}_{1},\mathfrak{e}_{2}\right\} _{\mu\nu}$ do however serve quite well as a local basis using grid displacement components, by which we mean the values $\left\{ \Delta x^{1}=x^{1}-\mu,\Delta x^{2}=x^{2}-\nu\right\} .$ So a location near $\mathscr{P}_{\mu\nu}$ can be located fairly accurately using the linear combination $\mathfrak{e}_{i}\Delta x^{i}.$

Let us now produce a simple paper map of the reference grid square, leaving room on the margins. The map must be drawn physically to scale, preserving angles and relative lengths. All we need to depict are the reference basis vectors and a direction drawn as a unit length arrow, $\hat{\mathfrak{r}}$ beginning at $\mathscr{P}_{o}$. For concreteness, lets choose a direction that is approximately, but not exactly northeast. Once created we put this tangent plane in our pocket and head out in the direction we marked along a self-transported geodesic which can be determined by "leapfroging" in an obvious way, three posts and a laser pointer. Upon arriving at the grid square $\mathscr{P}_{\mu\nu}$ which is a good distance from $\mathscr{P}_{o}$, we produce a local map on a transparent sheet of plastic, drawn physically to the same scale as the reference map; beginning with a representation of the angle formed by the first edge encountered and our path of travel. And we certainly need to depict our local basis.

Once created, we overlay this map on the reference map, placing the image of$\mathscr{P}_{\mu\nu}$ directly over that of $\mathscr{P}_{o},$ and aligning $\hat{\mathfrak{r}}$ parallel to the image of our geodesic path. Using surveying tools, we can use either the $\left\{ \mathfrak{e}_{i}\right\} _{\mu\nu}$ basis or the parallel transport of $\left\{ \hat{\mathfrak{e}}_{\underline{i}}\right\} _{o}$ to locate nearby points as accurately as we can using ordinary grid coordinates near $\mathscr{P}_{o}.$ Since our local basis is not orthonormal, we can't simply use units of length as unit coordinate scalars multiples of the $\mathfrak{e}_{i}.$ So we invent a pair of tools $\left\{ \mathfrak{e}^{1},\mathfrak{e}^{2}\right\} _{\mu\nu}$ to raw vectors into local coordinates. For example given the physical vector $\Delta\mathfrak{p}=\mathscr{P}-\mathscr{P}_{\mu\nu},$ we obtain the local coordinate representation of $\mathscr{P}$ by $\mathfrak{e}_{i}\mathfrak{e}^{i}\cdot\Delta\mathfrak{p}=\mathfrak{e}_{i}\Delta p^{i}.$ The object $\mathfrak{e}_{i}\mathfrak{e}^{i}$ is the identity dyadic tensor, and the $\mathfrak{e}^{i}$ are the contravariant basis vectors dual to the covariant basis vectors $\mathfrak{e}_{i}.$ Clearly we have $\mathfrak{e}^{i}\cdot\mathfrak{e}_{j}=\delta_{j}^{i}.$ We can even use these to represent our reference basis in terms of the local basis. The inverse transformation is the familiar use of orthonormal basis vectors treating them as self-dual: $\hat{\mathfrak{e}}_{\underline{i}}=\hat{\mathfrak{e}}^{\underline{i}}$.

\begin{align*} \hat{\mathfrak{e}}_{\underline{i}}= & \mathfrak{e}_{i}\mathfrak{e}^{i}\cdot\hat{\mathfrak{e}}_{\underline{i}}=\mathfrak{e}_{i}e^{i}{}_{\underline{i}}\\ \mathfrak{e}_{i}= & \hat{\mathfrak{e}}_{\underline{i}}\hat{\mathfrak{e}}^{\underline{i}}\cdot\mathfrak{e}_{i}=\hat{\mathfrak{e}}_{\underline{i}}e^{\underline{i}}{}_{i} \end{align*}

It is easily shown that the matrices of coefficients in these transformations are mutually inverse, and the rows (indicated by raised indices) consist of the components of the contravariant basis vector of the same index, and similarly for the columns. Using these transformations we can also find the local coordinates of a nearby point in one system if we have them in the other.

\begin{align*} \Delta p^{i}= & \mathfrak{e}^{i}\cdot\hat{\mathfrak{e}}_{\underline{i}}\Delta p^{\underline{i}}=\mathfrak{e}_{i}e^{i}{}_{\underline{i}}\Delta p^{\underline{i}}\\ \Delta p^{\underline{i}}= & \hat{\mathfrak{e}}^{\underline{i}}\cdot\mathfrak{e}_{i}\Delta p^{i}=e^{\underline{i}}{}_{i}\Delta p^{i} \end{align*}

Obviously the $\Delta p^{\underline{i}}$ are not the grid coordinates of the point near $\mathscr{P}_{\mu\nu}$ they represent locally. But if we have the coordinates of the point set of any figure on the grid square at $\mathscr{P}_{\mu\nu}$ we can faithfully reproduce it by treating the $\Delta p^{\underline{i}}$ as ordinary components. That is, we can parallel transport the tangent plane at $\mathscr{P}_{\mu\nu}$ back to $\mathscr{P}_{o}.$

Notice that we did not use grid coordinates at all to produce the local coordinates of points near $\mathscr{P}_{\mu\nu}.$ But we certainly can. We could use grid coordinates in the original way to locate nearby points as we would without a local basis. We would then use the method of physical measurement as discussed above. Another option is to convert grid coordinates into local coordinates by subtracting the coordinates of $\mathscr{P}_{\mu\nu}$ from the grid coordinates. That is, $\left\{ \Delta x^{1}=x^{1}-\mu,\Delta x^{2}=x^{2}-\nu\right\} .$

Once we have the $\left\{ \Delta x^{i}\right\} $ coordinates on the local basis, we can transform them to coordinates on the parallel transported reference basis $\Delta x^{\underline{i}}=e^{\underline{i}}{}_{i}\Delta x^{i}.$ If we think of them as parallel transported back to the origin, we can write them as grid coordinates near $\mathscr{P}_{o}$; that is $x^{\underline{i}}=\Delta x^{\underline{i}}.$ We have now produced a differentiable mapping of grid coordinates near $\mathscr{P}_{\mu\nu}$ to grid coordinate near $\mathscr{P}_{o}$. For the sake of uniformity we can use the same indexing for the coordinates of $\mathscr{P}_{\mu\nu}$ as we have for other grid locations. We want to keep the coordinates of the $\mathscr{P}_{\mu\nu}$ distinct from those of other grid locations so we use $\left\{ \xi^{1}=\mu,\xi^{2}=\nu\right\} .$ Now we write the function and differentiate it

\begin{align*} \bar{\mathit{x}}\left[x^{i}\right]_{\mu\nu}= & \hat{\mathfrak{e}}_{\underline{i}}e^{\underline{i}}{}_{i}\left(x^{i}-\xi^{i}\right)\\ = & \left\{ \mathit{x}^{i}\left[x^{i}\right]=e^{\underline{i}}{}_{i}\left(x^{i}-\xi^{i}\right)\right\} \\ \mathit{x}^{\underline{i}}{}_{.i}= & e^{\underline{i}}{}_{i}. \end{align*}

There's nothing there left to differentiate again, $\dots$ unless, $\dots$ we treat the $\xi^{i}$ as continuous variables. And now I wave my hands and say "See! I told you so!".