Why is the "correct" proof of the chain rule correct? What is actually happening here?

3.7k Views Asked by At

There is a correct and an incorrect proof going around when it comes to the Chain Rule (see below). The problem with the incorrect proof is that $g(x)-g(a)$ might be $0$ if $x\to a$ creating a division by zero.

Question

I can't get my head around why the correct proof solves the problem of the incorrect proof. Why can we just define a function $E$ and suddenly all our problems disappear?

I just don't really get what actually happens in the correct proof. It just didn't "click" in my brain yet. Any help would be much appreciated.

By the way; is my "correct proof" below indeed correct?

Incorrect proof:

$$\lim \limits_{x \to a}\frac{f(g(x))-f(g(a))}{x-a}=\lim \limits_{x \to a}\frac{f(g(x))-f(g(a))}{g(x)-g(a)}\times\frac{g(x)-g(a)}{x-a}=f'(g(x))g'(x)$$

Correct proof:

We first define a function $E$

$$E(0)=0$$ $$E(g(x)-g(a))=\frac{f(g(x))-f(g(a))}{g(x)-g(a)}-f'(g(x))$$

In any case: $$f(g(x))-f(g(a))=(E(g(x)-g(a))+f'(g(x)))\times(g(x)-g(a))$$

Dividing by $x-a$ and taking the limit we get:

$$\begin{align} \frac{d}{dx}f(g(x))&=\lim \limits_{x \to a}\frac{f(g(x))-f(g(a))}{x-a}\\ &=\lim \limits_{x \to a}(E(g(x)-g(a))+f'(g(x)))\times\frac{g(x)-g(a)}{x-a}\\&=f'(g(x)g'(x) \end{align}$$


EDIT: In other words: we basically state that when $g(x)=g(a)$:

$$\frac{f(g(x))-f(g(a))}{g(x)-g(a)}-f'(g(x))=0$$

But why can we state that? As I understand it, this is true for the limit, but why are we allowed to also state it for the actual value?

5

There are 5 best solutions below

1
On BEST ANSWER

There are two things wrong with your original proof, and the "EDIT" section of the original post is also wrong.

First problem: To define a function $E$, you have to say how to apply $E$ to an arbitrary number $h$. You haven't done that. Here is a better definition of $E$: $E(0) = 0$, and if $h \ne 0$ then \begin{equation} E(h) = \frac{f(g(a)+h) - f(g(a))}{h} - f'(g(a)). \end{equation} For $h \ne 0$, the formula defining $E(h)$ can be rearranged to read: \begin{equation} (E(h) + f'(g(a))) \times h = f(g(a)+h) - f(g(a)). \end{equation} But notice that this last equation is also true if $h=0$, since both sides are $0$, so the equation is true for all values of $h$. Plugging in $g(x)-g(a)$ for $h$, we get \begin{equation} (E(g(x)-g(a))+f'(g(a))) \times (g(x)-g(a)) = f(g(x))-f(g(a)). \end{equation} This is (almost) the same as your "in any case" equation.

Second problem: In your final calculation, you are mixing up the derivative with the value of the derivative at a particular point. The limit \begin{equation} \lim_{x \to a} \frac{f(g(x))-f(g(a))}{x-a} \end{equation} doesn't give you the derivative, it gives you the value of the derivative at $a$. So the proof should end like this: \begin{align} \left.\frac{d}{dx}f(g(x))\right|_{x=a} &= \lim_{x \to a} \frac{f(g(x))-f(g(a))}{x-a}\\ &= \lim_{x \to a} (E(g(x)-g(a))+f'(g(a))) \times \frac{g(x) - g(a)}{x-a}\\ &= f'(g(a))g'(a). \end{align}

There is a subtle point in the last step that you may be missing. Since $g$ is differentiable at $a$, it is continuous at $a$, so $\lim_{x \to a} (g(x) - g(a)) = g(a)-g(a) = 0$. But why does it follow that $\lim_{x \to a}E(g(x)-g(a)) = E(0) = 0$? The answer is: because $E$ is continuous at $0$. (Look in your calculus book in the section on continuous functions. You will find a theorem that says that if $\lim_{x \to a} f(x) = L$ and $g$ is continuous at $L$, then $\lim_{x \to a} g(f(x)) = g(L)$. That theorem is being used in this step.) So to have a complete proof, you need to verify that $E$ is continuous at $0$. To verify that, check that $\lim_{h \to 0} E(h) = 0 = E(0)$. In this limit, $h$ is approaching $0$ but it is not equal to $0$, so we can use the formula for $E(h)$ when $h \ne 0$: \begin{equation} \lim_{h \to 0} E(h) = \lim_{h \to 0} \left(\frac{f(g(a)+h)-f(g(a))}{h} - f'(g(a))\right) = f'(g(a))-f'(g(a)) = 0. \end{equation}

Finally, the problem with the "EDIT" section of the original post: You seem to think that by defining $E$, we are somehow changing the meaning of the expression \begin{equation} \frac{f(g(x))-f(g(a))}{g(x)-g(a)}. \end{equation} We are not. That expression still means what it meant before, so it is undefined when $g(x) = g(a)$. All we're doing is defining a new function $E$, and it is only formulas involving the letter $E$ whose meaning is affected by that definition. No justification is needed for this--you can define a new function however you want.

6
On

Note that we get into trouble with $\frac{f(g(x))-f(g(a))}{g(x)-g(a)}$ when $g(x) = g(a)$. However, as a function of $x$, it has well-defined limit at all those points, namely $f'(g(a))$. So what they do when introducing $E$ is simply "filling in" those holes so that we get an expression that is valid for all $x$. We could just as well have said

Consider the expression which is $$ \frac{f(g(x))-f(g(a))}{g(x)-g(a)}\times\frac{g(x)-g(a)}{x-a} $$ when $g(x) \neq g(a)$, and $$ f'(g(a))\times \frac{g(x)-g(a)}{x-a} $$ when $g(x) = g(a)$, and take its limit when $x\to a$.

and this would've been more or less the exact same thing.

1
On

Here is a "correct" proof:

From the usual definition of the derivative one immediately deduces the following

Lemma. A function $f$ is differentiable at the point $a$ with $f'(a)=A$ iff there is a function $m_{f,a}=:m$, continuous at $a$ with $m(a)=A$, such that for all $x$ one has $$f(x)-f(a)=m(x)(x-a)\ .$$

Under the hypotheses of the chain rule one therefore has $$f\bigl(g(x)\bigr)-f\bigl(g(a)\bigr)=m_{f,g(a)}\bigl(g(x)\bigr)\bigl(g(x)-g(a)\bigr)=m_{f,g(a)}\bigl(g(x)\bigr)m_{g,a}(x)(x-a)\ .$$ Since $g$ is continuous at $a$ the product $x\mapsto m_{f,g(a)}\bigl(g(x)\bigr)m_{g,a}(x)$ is continuous at $a$ as well, and takes the value $f'\bigl(g(a)\bigr)g'(a)$ there. By the reverse direction of the Lemma the chain rule follows.

4
On

You can avoid the "correct" proof this way:

Case 1: $g'(a) \ne 0.$ Here the "fake proof" works! That's simply because $(g(x) - g(a))/(x-a)$ is nonzero for $x$ close to, but not equal to, $a.$ For such $x,$ we have $g(x)\ne g(a),$ and now the fake news is actually news.

Case 2: $g'(a) = 0:$ Because $f'(g(a))$ exists, there exists a constant $c>0$ and a $\delta > 0$ such that

$$\tag 1 |f(y)-f(g(a))|\le c|y-g(a)|\, \text { for } y\in (g(a)-\delta, g(a)+\delta).$$

Now $g$ is continuous at $a,$ so there exists $\gamma > 0$ such that $x\in (a-\gamma, a + \gamma)$ implies $g(x) \in (g(a)-\delta, g(a)+\delta).$ For such $x$ we can use $(1)$ to see

$$|f(g(x))-f((g(a))| \le c |g(x)-g(a)|.$$

Now divide by $|x-a|$ and let $x\to a.$ On the right we get limit $0$ because $g'(a)=0.$ Therefore the limit on the left is $0,$ which is exactly the same as saying $(f\circ g)'(a) = 0.$ That is the desired conclusion in this case.

0
On

Here is one proof which does not require you to have any special definition for the difference quotient. Consider the ratio $$\frac{f(g(x)) - f(g(a))} {x-a} \tag{1}$$ It can be written as $$\frac{f(g(x))-f(g(a))}{g(x)-g(a)}\cdot \frac{g(x) - g(a)} {x-a} \tag{2}$$ provided $g(x) - g(a) \neq 0$ for all $x$ in some deleted neighborhood of $a$. Under this assumption the usual proof works and we get the result $(f\circ g) '(a) =f' (g(a)) g'(a) $ by taking limit as $x\to a$ in equation $(2)$.

Let's see what happens when this assumption does not hold. It means that in every deleted neighborhood of $a$ we have some $x$ for which $g(x) =g(a) $. It is easy to prove that in this case we have $g'(a) =0$ (prove this and let me know if you need help here, you can start by assuming $g'(a) >0$ and try to get a contradiction and similarly handle $g'(a) <0$). Now we can see that if $g(x) =g(a) $ then the difference quotient in $(1)$ is $0$. And if $g(x) \neq g(a) $ then the difference quotient can be written as in $(2)$ and the first factor is bounded (because $f'(g(a)) $ exists) and second factor tends to $0$ so that the overall product also tends to $0$ as $x\to a$ and thus $(f\circ g) '(a) =0$. The reasoning in the last sentence can be formalized with the definition of limit as shown below.


Let $\epsilon >0$ be arbitrary. There exists a $\epsilon' >0$ such that $$\left|\frac{f(y) - f(g(a))} {y-g(a)} - f'(g(a)) \right|<1$$ for all $y$ with $0<|y-g(a)|<\epsilon '$. Therefore $$\left|\frac{f(y) - f(g(a))} {y-g(a)} \right|<|f' (g(a)) |+1=K\text{(say)}\tag{3}$$ whenever $0<|y-g(a)|<\epsilon '$. Next note that $g$ is continuous at $a$ (because it is differentiable at $a$) therefore we have a $\delta_{1}>0$ such that $$|g(x) - g(a) |<\epsilon' \tag{4}$$ whenever $|x-a|<\delta_{1}$. Further since $g'(a) =0$ there is a $\delta_{2}>0$ such that $$\left|\frac{g(x) - g(a)} {x-a} \right|<\frac{\epsilon} {K}\tag{5} $$ whenever $0<|x-a|<\delta_{2}$. Let $\delta=\min(\delta_{1},\delta_{2})$. If $0<|x-a|<\delta$ then both the inequalities $(4)$ and $(5)$ hold. Further if $g(x) =g(a)$ then difference quotient in $(1)$ is $0$ and if $g(x)\neq g(a) $ then by all the previous equations we can see that the difference quotient in $(1)$ is less than $\epsilon$ in absolute value. In other words we have $$\left|\frac{f(g(x)) - f(g(a))} {x-a} \right|<\epsilon$$ whenever $0<|x-a|<\delta$. Thus $(f\circ g) '(a) =0$.


The above proof is taken from Hardy's A Course of Pure Mathematics and it avoids the trick used by Spivak (defining the difference quotient $(1)$ in a continuous manner when $g(x) =g(a) $). The essential idea of the proof is easy to understand and the last part of the proof dealing with $\epsilon, \delta$ is necessary only to satisfy those who insist.