Is my proof for the single variable chain rule correct?

154 Views Asked by At

I have studied the proof for the Chain Rule given by Spivak in Calculus but I found the argumentation convoluted, so I tried proving the Chain Rule on my own. I feel like what I've written is watertight but I figured I'd ask here to see if some people stronger than me in math could find any issues with it.

Begin with the statement $\lim_{h \to 0} \frac{f(g(a+h))-f(g(a))}{h}$.

Add and subtract $g(a)$ so that we get:

$=\lim_{h \to 0} \frac{f(g(a)+g(a+h)-g(a))-f(g(a))}{h}$

Assume that the function $g$ is not locally constant. If it is locally constant then by default, $\frac{d}{dx}f(g(x)) = 0$. Then, we multiply by $\frac{h}{h}$ inside the function to get:

$=\lim_{h \to 0} \frac{f(g(a)+h\frac{g(a+h)-g(a)}{h})-f(g(a))}{h}$

Then multiply the whole limit expression by $1$ again with $\frac{\frac{g(a+h)-g(a)}{h}}{\frac{g(a+h)-g(a)}{h}}$:

$=\lim_{h \to 0} \frac{f(g(a)+h\frac{g(a+h)-g(a)}{h})-f(g(a))}{h\frac{g(a+h)-g(a)}{h}}\frac{g(a+h)-g(a)}{h}$

Since the limit of a product of functions is the product of the limits of each function,

$=\lim_{h \to 0} \frac{f(g(a)+h\frac{g(a+h)-g(a)}{h})-f(g(a))}{h\frac{g(a+h)-g(a)}{h}}\lim_{h \to 0}\frac{g(a+h)-g(a)}{h}$.

The right hand expression is just $g'(a)$. The left hand expression remains to be evaluated. Here I introduce the secant slope function as well as the derivative:

$\phi(a, h) = \begin{cases} \frac{g(a+h)-g(a)}{h} & h \neq 0 \\ g'(a) & h = 0\end{cases}$

Substituting,

$\frac{d}{dx}f(g(x)) = [\lim_{h \to 0} \frac{f(g(a)+h\phi(a, h)) - f(g(a))}{h\phi(a,h)}]g'(a)$

Proceed by evaluating the remaining limit expression. Call it $L$, and write a $\delta-\epsilon$ statement of the existence of the limit:

$\forall\epsilon>0, \exists\delta>0: 0<|h|<\delta \implies |\frac{f(g(a)+h\phi(a, h)) - f(g(a))}{h\phi(a,h)} - L|<\epsilon$

We massage the $\delta$ expression by multiplying it by $|\phi(a, h)|$:

$\forall\epsilon>0, \exists\delta>0: 0<|h\phi(a, h)|<\delta|\phi(a, h)| \implies |\frac{f(g(a)+h\phi(a, h)) - f(g(a))}{h\phi(a,h)} - L|<\epsilon$

The point of this was to construct a new $\delta^*$ and allow a substitution to do some cleanup work: $k = h\phi(a, h)$, so that we construct a derivative expression:

$\forall\epsilon>0, \exists\delta^*>0: 0<|k|<\delta^* \implies |\frac{f(g(a)+k) - f(g(a))}{k} - L|<\epsilon$

Clearly, $L = f'(g(a))$. So now I can write the chain rule:

$\lim_{h \to 0} \frac{f(g(a+h))-f(g(a))}{h} = f'(g(a))g'(a)$. Or,

$\frac{d}{dx}f(g(x)) = f'(g(x))g'(x)$

I will greatly appreciate any feedback on this!

2

There are 2 best solutions below

8
On BEST ANSWER

First things first: when doing a proof you should be clear on exactly what it is you are proving. The chain rule states that if $g$ is differentiable at $a$ and $f$ is differentiable at $g(a)$, then $f\circ g$ is differentiable at $a$ and $(f\circ g)'(a) = f'(g(a))g'(a)$.

Note that what you know at the start is the differentiability of $f$ and $g$. The differentiability of $f\circ g$ is what you have to prove. In a proof, you start with what you know and derive from that the thing you want to prove.

You simply assume that $f\circ g$ is differentiable at this point:

$\forall\epsilon>0, \exists\delta>0: 0<|h|<\delta \implies |\frac{f(g(a)+h\phi(a, h)) - f(g(a))}{h\phi(a,h)} - L|<\epsilon$

The prior limit expression could be considered as conditional on convergence, but at this point, you use this expression to choose the $\delta$ that you need to define $\delta^*$. But that $\delta$ exists only if the limit converges.

As has also been noted, $g(x)$ can equal $g(a)$ infinitely many times near $a$ without $g$ being locally constant. For example, if $a = 0$ and $g(x) = x^2\sin\left(\frac 1x\right)$ for $x \ne 0, g(0) =0$. This $g$ is differentiable with $g'(0) = 0$ and is not locally constant. The problem with such cases is that your expression for $\delta^*$ becomes $0$ at those locations, violating the requirement that it be $> 0$.

And again as has been noted, It is $\exists \delta > 0, \forall h$, not $\forall h,\exists \delta^*$. There is no such thing at $h$ when $\delta^*$ is picked.

But on top of all those issues, this whose thing is quixotic, because your conclustion is

$\forall\epsilon>0, \exists\delta^*>0: 0<|k|<\delta^* \implies |\frac{f(g(a)+k) - f(g(a))}{k} - L|<\epsilon$

i.e., that $f$ is differentiable at $g(a)$. So you assumed the thing you were supposed to prove, in order to prove something you already knew.

You need to turn it around. Start with the existence of $f'(g(a))$ and $g'(a)$, and deduce the existence of $(f\circ g)'(a)$.

12
On

Ok, following Paul Sinclair's advice, this is how I'd answer my own question (edited 5/3/2023, but shaky about how to use the given that $f$ is continuous at $x=b$):

Lemma: If $\lim_{x \to b} f(x) = L$ and $\lim_{x \to a} g(x) = b$, given that either (1) $f$ is continuous at $b$ or (2) there is a neighborhood of $a$ on which $g \neq b$ except at $x=a$, then $\lim_{x \to a} f(g(x)) = L$.

We have two givens:

(1) $\forall \epsilon > 0, \exists \delta'>0: 0 < |y-b| < \delta' \implies |f(y) - L| < \epsilon$, and

(2) $\forall \delta' > 0, \exists \delta>0: 0 < |x-a| < \delta \implies |g(x) - b| < \delta'$.

If $f$ is continuous at $b$, and given that $\lim_{x \to b}f(x) = L$, then we can conclude that $L = f(b)$. So, if we let $y = g(x)$, then we can merge the two statements:

$0 < |x-a| < \delta \implies 0 < |y - b| < \delta' \implies 0 < |f(y)-f(b)| < \epsilon \implies |f(g(x))-f(b)| < \epsilon$, hence $\lim_{x \to a} f(g(x)) = f(\lim_{x \to a} g(x)) = f(b) = L$.

If there is a neighborhood of $a$ on which $g(x) \neq b$ except at $x=a$, because $\lim_{x \to a} g(x)$ exists and is equal to $b$, $g$ is continuous at $x=a$, hence $b = g(a)$. Let $y = g(x)$.

$0 < |x-a| < \delta \implies 0 < |y - g(a)| < \delta' \implies |f(y)-L| < \epsilon \implies |f(g(x))-L| < \epsilon$, hence $\lim_{x \to a} f(g(x)) = L$

Proving the chain rule:

Given that $f'(g(a))$ and $g'(a)$ exist, prove that the derivative of $f(g(x))$ at $x=a$ is $f'(g(a))g'(a)$.

Case 1: $g$ is locally constant in a neighborhood of $x=a$. In this case, $g'(a)=0$, which implies $f'(g(a))g'(a)=0$.

Case 2: $g$ is not locally constant in a neighborhood of $x=a$. Here, proceed by rewriting the limit as a limit of a product:

$\lim_{x \to a}\frac{f(g(x)) - f(g(a))}{x-a}=\lim_{x \to a}\frac{f(g(x)) - f(g(a))}{g(x)-g(a)} \lim_{x \to a}\frac{g(x) - g(a)}{x-a}$

The second limit evaluates to $g'(a)$. Existence of $g'(a)$ means that $g$ is continuous at $x=a$, so $\lim_{x \to a}\frac{f(g(x)) - f(g(a))}{g(x)-g(a)} = \lim_{y \to g(a)}\frac{f(y) - f(g(a))}{y-g(a)} = f'(g(a))$.

Hence, the derivative of $f(g(x))$ at $x=a$ is $f'(g(a))g'(a)$.