How can the chain rule be explained more rigorously?

1.3k Views Asked by At

The 'proof' for the chain rule that is often used at school is unsatisfying for me because it treats derivatives as fractions:

$$ \frac{dy}{dx}=\frac{dy}{du}\times\frac{du}{dx} $$

However, the more rigorous proofs that are used in University are unfathomable to me because they are intended for people with a much greater level of background knowledge. Is there a way I can think of the chain rule (perhaps not a rigorous proof) that acknowledges that derivatives are a shorthand for limit expressions, but does not use esoteric notation or complicated methods?

For reference, here is a list of things that I do and don't know:

  • I know how to differentiate from first principles using $$f'(x)=\lim_{h\to0}\frac{f(x+h)-f(x)}{h}$$
  • Apart from the chain rule, I know the product rule and the quotient rule (but again, I don't know the proofs for these rules)
  • I know some limit laws (e.g. the quotient law for limits)
  • I don't have a rigorous understanding of limits, but I think I have a good intuitive grasp of them
  • Similarly, I have an intuitive understanding of continuous vs. discontinuous functions (continuous = not lifting your pen off the page), but I have not been taught the formal definition for continuity

Thank you for reading.

2

There are 2 best solutions below

5
On BEST ANSWER

Nobody should use the "fraction" approach.

To provide intuition I tend to fall back on linear approximation.

If we write $$f(x+\epsilon)\approx f(x)+ f'(x)\epsilon$$

Then $$f\circ g(x+\epsilon)\approx f\circ g(x)+(f\circ g)'(x)\epsilon$$

But we could also write $$f\circ g(x+\epsilon)\approx f(g(x)+ g'(x)\epsilon)\approx f\circ g (x)+f'(g(x))g'(x)\epsilon$$

And comparing the two shows that $$(f\circ g)'(x)=f'(g(x))g'(x)$$ as desired.

Of course, to make this rigorous one has to argue that the coefficient in the linear approximation is uniquely defined and so on, but students ought to be aware that this interpretation of derivatives is an important tool in numerical analysis and the chain rule drops out of it.

0
On

The definition of derivative as a limit is sufficient to understand a full rigorous proof of chain rule.

The idea that rigorous proofs are difficult for beginners and should be avoided for a first course is really really a stupid idea.

I believe the key issue with understanding chain rule is that people don't really know what it means apart from the usual equation $$\frac{dy} {dx} =\frac{dy} {du} \cdot\frac{du} {dx} $$ Let's make a clear and simple statement of the rule as follows:

Chain Rule: Let $g$ be a real valued function defined in a neighborhood of $a$ and $f$ be another real valued function defined in some neighborhood of $g(a) $. Further let $g$ be differentiable at $a$ with derivative $g'(a) $ and $f$ be differentiable at $g(a)$ with derivative $f'(g(a)) $. Then the composite function $f\circ g$ defined by $(f\circ g) (x) =f(g(x)) $ is differentiable at $a$ with derivative $$(f\circ g) '(a) =f' (g(a)) g'(a) $$

First of all we need to observe that if $g$ is defined in some neighborhood of $a$ and $f$ is defined in some neighborhood of $g(a) $ then it does not necessarily mean that the composite function $f\circ g$ is defined in some neighborhood of $a$. All we can infer in this case is that $f(g(a)) $ is defined.

However if $g$ is continuous at $a$ and $f$ is continuous at $g(a) $ then $f\circ g$ is necessarily defined in some neighborhood of $a$ and continuous at $a$ (prove this!). Since the given functions are assumed differentiable they are also continuous and hence $f\circ g$ is defined in a certain neighborhood of $a$ and therefore it makes sense to ask if it is differentiable at $a$. Chain rule ensures that it is so and also gives a formula for its derivative at $a$.

To find the derivative $(f\circ g) '(a) $ we apply the definition of derivative $$(f\circ g)' (a) =\lim_{h\to 0}\frac{f(g(a+h))-f(g(a))}{h}\tag{1}$$ and we need to evaluate the limit above. Process of evaluation of a limit in general also proves its existence because of the way limit laws work.

Let us write $$k=g(a+h) - g(a) $$ so that $$f(g(a+h)) - f(g(a)) =f(g(a) +k) - f(g(a)) $$ and then there are two cases to consider :

  • Case 1: There is a neighborhood of $0$ such that $k$ is non-zero for all non-zero values of $h$ in that neighborhood. We express this by saying that $k\neq 0$ as $h\to 0$.
  • Case 2: There is no such neighborhood of $0$. In other words every neighborhood of $0$ contains a non-zero value of $h$ for which $k=0$.

Most proofs which lack rigor avoid the slightly complicated case 2. For case 1 we can rewrite the desired limit as $$\lim_{h\to 0}\frac{f(g(a)+k)-f(g(a))}{k}\cdot\frac{g(a+h)-g(a)}{h}$$ As $h\to 0$ we have $k\to 0$ (this is same as saying that $g$ is continuous at $a$) and hence first fraction tends to $f'(g(a)) $ and second fraction tends to $g'(a) $ so that the desired limit is $f'(g(a)) g'(a) $.

For case 2 we necessarily have $g'(a) =0$ (but case 1 does not imply $g'(a) \neq 0$, case 1 also includes functions $g$ with $g'(a) =0$). You should try to prove the contrapositive that $g'(a) \neq 0$ implies case 1.

As $h\to 0$ we have $k\to 0$ and if we get $k=0$ then the expression under limit in $(1)$ also becomes $0$ and if $k\neq 0$ then we can rewrite it as a product of two fractions (as in case 1) the first of which is bounded (because its limit exists) and the second one is small (as its limit $g'(a) =0$) and thus the expression under limit in $(1)$ can be made arbitrarily small as $h\to 0$ so that the desired limit is $0$. Thus the chain rule formula $$(f\circ g) '(a) =f' (g(a)) g'(a) $$ holds as both sides equal $0$. We can add a bit of formalism of Greek symbols $\epsilon, \delta$ for those who insist while dealing with case 2 but as such it does not add any extra rigor.