Why is the chain rule stated as $\partial [f\circ g]=\partial[f(g(x))]\circ \partial[g(x)] $

111 Views Asked by At

If $\partial [f\circ g]=\partial[f(g(x))]\circ \partial[g(x)] $ then with the example $f=x^3$ and $g=\sin(x)$ then $$ f\circ g=\sin^3x $$

using the stated chain rule $$ \partial [f\circ g]=[3x^2\circ \sin(x)]\circ \cos(x) $$ which gives the wrong answer $$ 3\sin^2(x)\circ\cos(x) =3\sin^2(\cos(x)) $$ I have attached where the theorem is stated Chain Rule. In fact, there is an example when three functions are used Example

4

There are 4 best solutions below

1
On BEST ANSWER

The problem here stems from the fact that the total derivative in multivariable calculus usually has a slightly different definition than the derivative in ordinary one variable calculus. In particular, if we have a function $f:\mathbb{R}\supset\kern-0.7mm\to\mathbb{R}$, then derivative of $f$ at the point $x\in\mathbb{R}$ is the real number $f'(x)\in\mathbb{R}$. In multivariable calculus, however, this does not really work, and so instead we take the derivative to be either a matrix or a linear map (both formulations are equivalent). In the treatment you have, it seems the linear map approach is taken, and so in particular if we have a function $g:\mathbb{R}^n\supset\kern-0.7mm\to\mathbb{R}^m$, then the derivative of $g$ at the point $x\in\mathbb{R}^n$ is the linear map $Dg(x):\mathbb{R}^n\to\mathbb{R}^m$.

The key thing to consider is that multiplication of real numbers corresponds to composition of linear maps, and so when we deal with the total derivative as a linear map, we get compositions where we otherwise would get multiplications. However this also means we have to be careful when we go between these two notions. In particular, if we wanted to consider the total derivative of out function $f:\mathbb{R}\supset\kern-0.7mm\to\mathbb{R}$ from before, then at $x\in\mathbb{R}$ it would be the linear map $Df(x):\mathbb{R}\to\mathbb{R}$ given by

$$[Df(x)](h)=f'(x)h.$$

This might seem like an insignificant difference, but it is key when we actually try to use the chain rule, as it does, in fact, turn composition into multiplication (this is indeed an example of an isomorphism). In particular, if we have two functions $f_1:\mathbb{R}\supset\kern-0.7mm\to\mathbb{R}$ and $f_2:\mathbb{R}\supset\kern-0.7mm\to\mathbb{R}$, then we can find $Df_1(x)\circ Df_2(x):\mathbb{R}\to\mathbb{R}$ by

$$[Df_1(x)\circ Df_2(x)](h)=[Df_1(x)](f_2'(x)h)=f_1'(x)f_2'(x)h,$$

i.e. my multiplying the derivatives.

Let us now look at how this can be used in your example. So the chain rule can be written as

$$D(f\circ g)(x)=Df(g(x))\circ Dg(x),$$

with $f:\mathbb{R}^k\supset\kern-0.7mm\to\mathbb{R}^m$ and $g:\mathbb{R}^n\supset\kern-0.7mm\to\mathbb{R}^k$ sufficiently nice. In particular,

$$[D(f\circ g)(x)](h)=[Df(g(x))\circ Dg(x)](h).$$

In your case, we consider the functions $f:\mathbb{R}\to\mathbb{R}$ and $g:\mathbb{R}\to\mathbb{R}$ given by

$$f(x)=x^3,\quad g(x)=\sin(x).$$

Now their derivatives we know are given by

$$f'(x)=3x^2,\quad g'(x)=\cos(x),$$

and so their total derivatives will be the linear maps given by

$$[Df(x)](h)=3x^2h,\quad [Dg(x)](h)=\cos(x)h.$$

Applying the chain rule we get that

$$[D(f\circ g)(x)](h)=[Df(g(x))\circ Dg(x)](h)=[Df(\sin(x))](\cos(x)h)=3\sin(x)^2\cos(x)h.$$

But we know then from the discussion about that if the total derivative is given by

$$[D(f\circ g)(x)](h)=3\sin(x)^2\cos(x)h,$$

then this corresponds to the ordinary derivative being given by

$$(f\circ g)'(x)=3\sin(x)^2\cos(x),$$

which is exactly what we should get from the one variable chain rule! And so in conclusion, the multivariable chain rule does indeed work, you just have to know the exact definitions you are working with to apply it, and you have to be a bit more careful when translating between derivatives in the context of one variable calculus and derivative in the context of multivariable calculus!

4
On

That's not the correct chain rule, second one is product of functions not composition. The correct chain rule is $$(f\circ g)'(x)=f'(g(x))\cdot g'(x).$$ In your example $$(x^3\circ \sin x)'=3(x^2\circ\sin x)\cdot (\sin x)'=3\sin^2x\cdot \cos x.$$

0
On

The chain rule you provided is indeed correct, you just misunderstood the role of the composition.

The idea is that you can see the derivative $Df(x)\in\mathbb{R}^{m\times n}$ as a linear application $Df(x)(\cdot):\mathbb{R}^n\rightarrow\mathbb{R}^m$. In this context, you can see the composition of linear applications as the product of the corresponding Jacobians. In the scalar case in particular, the "matrix" $f'(x)$ is just a scalar, so you can use product and composition interchangeably.

This point of view is very common in some areas of math. For instance, here's the definition provided in Calculus on Manifolds by Micheal Spivak:

0
On

Suppose $g:\mathbb R^n\to\mathbb R^m$ is differentiable at $x$, and $f:\mathbb R^m\to\mathbb R^p$ is differentiable at $g(x)$. Then, the multivariable chain rule states that $f\circ g$ is differentiable at $x$, and $$ d(f\circ g)(x)=df\bigl(g(x)\bigr)\circ dg(x) $$ If $n=m=p=1$, and $g=x\mapsto \sin x$, and $f=x\mapsto x^3$, then for all $x\in\mathbb R$, the linear map $df(x):\mathbb R\to\mathbb R$ equals $u\mapsto 3x^2u$. Hence, for all $x\in\mathbb R$, we have $df\bigl(\sin x\bigr)=u\mapsto 3(\sin^2 x)u$. Similarly, for all $x\in\mathbb R$, we have $dg(x)=u\mapsto (\cos x)u$. Therefore, their composition $df\bigl(g(x)\bigr)\circ dg(x)$ is given by $u\mapsto 3(\sin^2 x)(\cos x)u$.

By identifying the linear map $x\mapsto cx$ with the scalar $c$, then we see that $d(f\circ g)(x)$ agrees with $(f\circ g)'(x)$ as it is defined in single-variable calculus.