Differential on Tangent Spaces and the Chain Rule

597 Views Asked by At

From Lee's Intro to Smooth Manifolds

I don't see how this is a straightforward application the chain rule. The idea of having different coordinates on the domain and codomain is throwing me off. I would have naively written $$\left.\frac{\partial}{\partial x^i} \right|_p (f \circ F)=\frac{\partial f}{\partial F^j}(F(p))\frac{\partial F^j}{\partial x^i}(p),$$ since if I write things more explicitly, i.e. since $x=(x^1,..,x^n) \in \mathbb{R^n}$ and $F(x)=(F^1(x),..,F^m(x)) \in \mathbb{R^m},$ we can write $f\circ F$ as $$f \circ F(x)=f(F^1(x^1,..,x^n),...,F^m(x^1,..,x^n)).$$ Then, disregarding evaluation of the original partial at the point $p$, we have by the multivariable chain rule that $$\frac{\partial}{\partial x^i} (f \circ F(x))=\frac{\partial}{\partial x^i} (f(F^1(x^1,..,x^n),...,F^m(x^1,..,x^n)))=\frac{\partial f}{\partial F^j}\frac{\partial F^j}{\partial x^i}(x).$$

So either what did I do wrong in my application of the chain rule or where do the coordinates $(y^i)$ come into play?

2

There are 2 best solutions below

0
On BEST ANSWER

To sum it up: Both are right, but Lee is talking about something else.

For a start, you should not be thrown off by the fact that the coordinates in $V$ have names $y^j$. The function $f$ is defined on $V$. It is in the first place a function ${\bf y}\mapsto f({\bf y})\in{\mathbb R}$, hence a function of the $y^j$. It does not make sense to write ${\partial f\over\partial F^j}$, since you can only partially differentiate with respect to coordinate variables, not with respect to functions. Therefore I'd write your chain rule as $$\frac{\partial}{\partial x^i} (f \circ F(x))=\frac{\partial}{\partial x^i} (f(F^1(x^1,..,x^n),...,F^m(x^1,..,x^n)))=\frac{\partial f}{\partial y^j}\frac{\partial F^j}{\partial x^i}(x).$$

The linked passage in Lee's book is not about partial derivatives and the chain rule per se. Instead it talks about the effect of the derivative $dF_p$ on the tangent vectors at $p$. The "special" tangent vectors ${\partial\over\partial x_i}\biggr|_p$ (introduced before, I hope) form a basis of $T_p{\mathbb R}^n$. Therefore Lee is out to compute their images $$dF_p\left({\partial\over\partial x_i}\biggr|_p\right)\in T_q{\mathbb R}^m\qquad\qquad(q:=F(p))$$ in terms of the "special" tangent vectors ${\partial\over\partial y^j}\biggr|_q$ forming a basis of $T_q{\mathbb R}^m$. In order to find the resulting matrix he looks at the effect $$dF_p\left({\partial\over\partial x_i}\biggr|_p\right).f$$ of $dF_p\left({\partial\over\partial x_i}\biggr|_p\right)$ on an arbitrary $f$ defined in the neighborhood of $q$. In the resulting computation he then needs the chain rule as quoted by you.

0
On

For an arbitrary smooth real-valued function $g$ on an open set $U\subseteq\Bbb R^n$, to compute $$\left.\frac{\partial}{\partial x^i}\right\vert_pg,$$ you can compute the whole linear transformation $Dg(p)$ (the total derivative) as a $1$ by $n$ matrix $$Dg(p)=\begin{bmatrix}\partial_jg(p)\end{bmatrix}$$ and then apply this linear transformation to the vector $e_i\in\Bbb R^n$ (the $i$-th canonical basis vector). We would have $$\left.\frac{\partial}{\partial x^i}\right\vert_pg=Dg(p)e_i.$$

With $g=f\circ F$, chain rule tells us that the matrix $D(f\circ F)(p)$ can be computed by first computing the total derivatives of $f$ and $F$ separately, and then composite them. You calculate $$Df(F(p))=\begin{bmatrix}\partial_jf(F(p))\end{bmatrix}, \ \ \ DF(p)=\begin{bmatrix}\partial_jF^i(p)\end{bmatrix}$$ and the matrix $D(f\circ F)(p)$ would be the composition of $Df(F(p))$ and $DF(p)$. Then apply this linear transformation to the vector $e_i$ to obtain the formula $$\left.\frac{\partial}{\partial x^i} \right|_p (f \circ F)=\frac{\partial f}{\partial y^j}(F(p))\frac{\partial F^j}{\partial x^i}(p),$$ where $\frac{\partial f}{\partial y^j}(F(p))$ are the partial derivatives of $f$.

As long as you interpret the symbol $$\frac{\partial f}{\partial F^j}(F(p))$$ as partially differentiate $f$ with respect to the $j$-th variable, you should be fine.

Now, here I include some extra content, to tell you your source of confusion.

You may have learned that chain rule goes like this:

Given a real-valued function $f$ that depends on the variables $y^1,y^2,\dots,y^m$ , which in turn depend on the variables $x^1,x^2,\dots,x^n$, the partial derivative of $f$ with respect to $x^i$ is given by $$\frac{\partial f}{\partial x^i}=\sum_{j=1}^m\frac{\partial f}{\partial y^j}\frac{\partial y^j}{\partial x^i}.$$

Then, you may have naively changed each $y^j$ into $F^j$, to become the formula you wrote.

Whoever taught you that chain rule goes like this, I blame them, for that they are abusing notations before a theorem is properly stated, and without making you understand the abuse of notations.

The proper way of understanding chain rule should be as follows:

There is a real-valued differentiable function $f:V\to\Bbb R$ on an open set $V\subseteq\Bbb R^m$, with its partial derivative with respect to the $j$-th coordinate denoted $\frac{\partial f}{\partial y^j}$. There is also a real-valued differentiable function $F:U\to V$ on an open set $U\subseteq\Bbb R^n$, with its $j$-th component function's partial derivative with respect to the $i$-th coordinate denoted $\frac{\partial F^j}{\partial x^i}$.

We can form the differentiable function $f\circ F$, and try to take partial derivatives of this function. The chain rule states that $$\frac{\partial (f \circ F)}{\partial x^i} =\sum_{j=1}^m\frac{\partial f}{\partial y^j}\frac{\partial F^j}{\partial x^i}.$$

This is the correct one, always. You can change the notation of partial derivatives into, for example, $$\frac{\partial f}{\partial y^j}=\partial_jf, \ \ \ \frac{\partial F^j}{\partial x^i}=\partial_iF^j,$$ but rewriting $F^j$ as $y^j$, or rewriting $y^j$ as $F^j$, both are certainly not correct. Writing the symbol $\frac{\partial f}{\partial F^j}$ is very wrong, in that $f$ should not ever have anything to do with the function $F$, and you just don't differentiate $f$, which has domain $V\subseteq\Bbb R^m$, with respect to a function $F^j$.

Some teachers or authors would abuse notations by writing $f \circ F=f$, $F^j=y^j$, to make the formula looks like in the quote, in hope that the formula is "intuitive" and "simple". And the formula with notation abuse is often presented to students before the correct one. In reality, this confuses every student new to multivariable calculus, and the students won't notice that they learned it wrong until late.