Spivak's Chain Rule Proof (Image of proof provided)

1.5k Views Asked by At

If $g$ is differentiable at $a$, and $f$ is differentiable at $g(a)$, then $f \circ g$ is differentiable at $a$, and $$ (f \circ g)^{\prime}(a)=f^{\prime}(g(a)) \cdot g^{\prime}(a). $$ Define a function $\phi$ as follows: $$ \phi(h)= \begin{cases}\frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)}, & \text { if } g(a+h)-g(a) \neq 0 \\ f^{\prime}(g(a)), & \text { if } g(a+h)-g(a)=0 .\end{cases} $$ It should be intuitively clear that $\phi$ is continuous at $0:$ When $h$ is small, $g(a+h)-g(a)$ is also small, so if $g(a+h)-g(a)$ is not zero, then $\phi(h)$ will be close to $f^{\prime}(g(a)) ;$ and if it is zero, then $\phi(h)$ actually equals $f^{\prime}(g(a))$, which is even better. Since the continuity of $\phi$ is the crux of the whole proof we will provide a careful translation of this intuitive argument.

We know that $f$ is differentiable at $g(a) .$ This means that $$ \lim _{k \rightarrow 0} \frac{f(g(a)+k)-f(g(a))}{k}=f^{\prime}(g(a)). $$ Thus, if $\varepsilon>0$ there is some number $\delta^{\prime}>0$ such that, for all $k$, $$ \text{if $0<|k|<\delta^{\prime}$, then $\left|\frac{f(g(a)+k)-f(g(a))}{k}-f^{\prime}(g(a))\right|<\varepsilon$}. \tag{1} $$ Now $g$ is differentiable at $a$, hence continuous at $a$, so there is a $\delta>0$ such that, for all $h$, $$\text{ if $|h|<\delta$, then $|g(a+h)-g(a)|<\delta^{\prime} .$}\tag{2}$$ Consider now any $h$ with $|h|<\delta .$ If $k=g(a+h)-g(a) \neq 0$, then $$ \phi(h)=\frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)}=\frac{f(g(a)+k)-f(g(a))}{k} ; $$ it follows from $(2)$ that $|k|<\delta^{\prime}$, and hence from (1) that $$ \left|\phi(h)-f^{\prime}(g(a))\right|<\varepsilon. $$

(transcribed from this screenshot)

Here is a proof of the chain rule in Spivak's Calculus. Note there is a second page, but I understand it, and this is the meat of the proof. I have a few questions.

$\textbf{1.}$ "It should be intuitively clear that $\phi$ is continuous at $0$." Do we care that it is continuous at zero so we will not have a division by zero since $g(a+h)-g(a)$ is in the denominator and could equal zero? I am not sure I understand why it is continuous at zero. I understand what he was saying but I was always under the impression continuity was when there were no breaks in the graph visually. Here, I am imagining $\phi(h)$ being continuous up to zero, then it jumping to another point when it is zero.

$\textbf{2.}$ At (2),I do not understand what we are trying to do here. We randomly switched to $h$ and are defining continuity I think. The switch back and forth from $k$ to $h$ is confusing me.

7

There are 7 best solutions below

0
On BEST ANSWER

The "intuitively clear" fact is that there is no visual break in the graph of $\phi(h)$. Sure, the graph of $$ \phi_1(h) = \frac{f(g(a + h)) - f(g(a))}{g(a + h) - g(a)} $$ has a "hole" where $h = 0$, and depending on the other values of $g(a+h)$, there may be additional holes or even entire intervals of the $x$-axis that have no value of $\phi_1(h)$. (Basically, whenever $g(a+h) = g(a)$, there is no value of $\phi_1(h)$.) But the only way to approach one of those "holes" is for the graph of the function to come right up to (or down to) the horizontal line that graphs the constant function $\phi_2(h) = f'(g(a))$. Every "hole" in $\phi_1(h)$ begins and ends on that line, and the second half of the definition of $\phi$ fills in each of those holes with exactly the function value that will connect all the pieces of the graph, namely the value $f'(g(a))$.

For the second part of your question, yes, all the business with statements $(1)$ and $(2)$ is directly using the epsilon-delta definition of continuity. But it requires two application of the definition, logically connected to each other, so we can't just use the symbols $\varepsilon$ and $\delta$ both times--the "epsilon" from one application of the definition is the "delta" for the other application.

In order to keep the symbols unambiguous, the proof uses $\varepsilon$ and $\delta'$ for the "epsilon" and "delta" in statement $(1)$, and it uses $\delta'$ and $\delta$ for the "epsilon" and "delta" in statement $(2)$.

You do have to keep track of what $h$ is versus what $k$ is. I think the trickiest part is near the end, in the sentence that starts, "If $k = g(a + h) - g(a) \neq 0$". By that time we have the condition $0 < \lvert h \rvert < \delta$, which guarantees that we don't produce any $k$ that violate $0 < \lvert k \rvert < \delta'$ this way, but we don't necessarily produce every value of $k$ that would satisfy that condition (which is OK; we don't need to do that). Also, we don't necessarily use every value of $h$ such that $0 < \lvert h \rvert < \delta$: any $h$ for which $g(a + h) - g(a) = 0$ has no corresponding value of $k$; instead, it produces one of the values of $\phi(h)$ that is already at the limit we're trying to show. Yes, this is complicated, and maybe that contributes to the opinions expressed in some other answers and comments that you might prefer to look at someone else's proof.

0
On

The even more intuitive reason why the result should be true is that:

$$\begin{split} (f \circ g)'(a) & = \lim_{h\to0} \frac{f(g(a+h)) - f(g(a)))}{h} \\ & = \lim_{h\to0} \frac{f(g(a+h)) - f(g(a)))}{g(a+h) - g(a)} \frac{g(a+h) - g(a)}{h} \\ & = (\lim_{g(a+h)\to g(a)} \frac{f(g(a+h)) - f(g(a)))}{g(a+h) - g(a)}) (\lim_{h\to0}\frac{g(a+h) - g(a)}{h}) \\ & = f'(g(a)) g'(a) \end{split}$$

Unfortunately this intuitive proof can fall apart when $g'(a) = 0$ because you may be multiplying and dividing by 0. So he's creating an auxiliary function which is guaranteed well-defined, and has the limit he wants for the first part of that expression. Then he's using that function to get around the potential division by 0 problem.

Editorial here.

I have always hated this proof. I much prefer thinking in terms of the tangent line approximation in which $f(a+h) \approx f(a) + f'(a) h$. If you define $\approx$ right, this leads to a definition which is fully equivalent to the usual limit definition. And now the intuitive reason why this is true is that:

$$\begin{split} f(g(a+h)) & \approx f(g(a)) + f'(g(a)) ( g(a+h) - g(a) ) \\ & \approx f(g(a)) + f'(g(a))(g(a) + g'(a)h - g(a)) \\ & = f(g(a)) + f'(g(a))g'(a)h\end{split}$$

Proving the first approximation does require an admittedly more technical proof. However it leads to a correct proof of the chain rule that works without change for multi-variable calculus. As well I think it important for people to really understand the tangent line approximation.

0
On

Firstly, I will address your conception of continuous functions. Globally continuous functions defined on a connected domain generally tend to have lack of "breaks in the graph visually". We don't know if $\phi(h)$ is globally continuous. We can only deduce the continuity at $0$. Many ugly functions can be shown to be continuous at one point. Consider, for example,

$$f(x) = \begin{cases} 0 & x \in \mathbb{R} \setminus \mathbb{Q}\\ x &x \in \mathbb{Q} & \end{cases}$$

This is continuous at $0$. Surely, this cannot be visualized without jumps.

The reason it is intuitively clear is because of the piecewise definition. We therefore don't have to worry about $g(a+h) - g(a) = 0$ because that's already taken care for us; it is $f'(g(a))$. Otherwise, the function can be made arbitrarily close to $f'(g(a))$ because the difference between $g(a)$ and $g(a+h)$ can be made arbitrarily small. Since $f$ is differentiable at $g(a)$, taking the difference quotient at $x$ values "near" $g(a)$ is (roughly) the derivative $f'(g(a))$. $g(a+h)$ can be made as "near" to $g(a)$ as we wish by taking $h$ small enough. Again, for small $h$, $g(a+h)$ and $g(a)$ are close (and can be made as close as we want by letting $h \to 0$), which implies the difference quotient is approximately the derivative of $f$ at $g(a)$.

What does this all mean? It means that when $h$ is "near" $0$, $\phi(h)$ is "near" $f'(g(a))$. Moreover, $\phi(0) = f'(g(a))$. This implies continuity.

To answer your second question, in ($2$) Spivak is simply formalizing this with $\epsilon - \delta$ the fact that $g(a+h) \to g(a)$ as $h \to 0$. This fact is necessary because it is justifies the substitution $k=g(a+h) - g(a)$.

0
On

Spivak's is an introductory book, that in theory should be usable by people who have never seen any calculus before, so he is making a long story out of something simple (but profound). His exposition is idiosyncratic in some places and is not necessarily worth trying to understand in every detail when clearer approaches to the same things are available. I'd suggest ignoring this forced attempt to make $f'(g(x))$ into a ratio, and reading the very short and easily understood proof by linear approximation (see btilly's answer) given in books on analysis such as Rudin.

0
On

There is a simple way to avoid this whole "what if $g(a+h) = g(a)$" mess. See Chain rule for composition of $\mathbb C$ differentiable functions This is a proof for "complex differentiability" but it works word for word in the real case.

0
On

I think you need to understand the reason behind the introduction of the function $\phi(h)$. Note that the number $(f\circ g)'(a)$ is defined by $$(f\circ g)'(a) = \lim_{h \to 0}\frac{f(g(a + h)) - f(g(a))}{h}$$ and this can be written as $$\lim_{h \to 0}\frac{f(g(a + h)) - f(g(a))}{g(a + h) - g(a)}\cdot\frac{g(a + h) - g(a)}{h}$$ provided that $g(a + h) - g(a) \neq 0$ as $h \to 0$. When $g(a + h) - g(a) = 0$ then we have a problem and the function $\phi(h)$ is invented to help solve this particular problem.

The function $\phi(h)$ ensures that $$f(g(a + h)) - f(g(a)) = \phi(h)\{g(a + h) - g(a)\}$$ for all values of $h$ near $0$ (this is checked easily). Next note that the above equation implies that $$\lim_{h \to 0}\frac{f(g(a + h) - f(g(a))}{h} = \lim_{h \to 0}\phi(h)\lim_{h \to 0}\frac{g(a + h) - g(a)}{h}$$ provided that $\lim_{h \to 0}\phi(h)$ exists. Clearly the second limit on RHS is $g'(a)$ and thus in order to prove chain rule we must ensure that $$\lim_{h \to 0}\phi(h) = f'(g(a))$$ Note that $\phi(0) = f'(g(a))$ by definition of $\phi(h)$ and hence we need to ensure that $\phi(h) \to \phi(0)$ as $h \to 0$. And therefore we need to ensure that $\phi(h)$ is continuous at $h = 0$. This answers your first query.

Coming to the second query regarding the use of $h$ and $k$, note that the use of variable $k$ is not necessary here. We need to show that $\phi(h) \to \phi(0) = f'(g(a))$ and for this Spivak takes a number $\epsilon > 0$ and tries to find a $\delta > 0$ such that $|\phi(h) - f'(g(a))| < \epsilon$ whenever $0 < |h| < \delta$. Note that that the definition of $\phi(h)$ is complicated and based on the value of the difference $g(a + h) - g(a)$ and this difference he denoted by $k$. Thus $$\phi(h) = \frac{f(g(a) + k) - f(g(a))}{k}$$ if $k \neq 0$ and $\phi(h) = f'(g(a))$ if $k = 0$. Note that when $k = 0$ then $|\phi(h) - f'(g(a))| = 0$ and hence it is automatically less than $\epsilon$ whatever the value of $h$. The problem is to ensure $|\phi(h) - f'(g(a))| < \epsilon$ when $k = g(a + h) - g(a) \neq 0$. To ensure this Spivak uses the differentiability of $f$ at $g(a)$ (which leads to genesis of $\delta' > 0$) and the continuity of $g$ at $a$ (which leads to genesis of $\delta$ based on $\delta'$). Thus we see that the variable $k$ is used for notational convenience and to establish some desired inequality in two steps (find $\delta'$ based on $\epsilon$ and $\delta$ based on $\delta'$).

0
On

Just to provide an additional point, one can understand Spivak's $\phi$ function as actually being a composite function. In fact, in Chapter 6 Exercise 12a, we proved a useful lemma that reads as follows:

If $f$ is continuous at $l$ and $\displaystyle \lim_{x \to a} g(x) = l$, then $\displaystyle \lim_{x\to a}f(g(x))=f(l)$

Although Spivak's $\phi$ function looks a little exotic, consider the slightly more digestible function $\psi$ which we define as follows:

$$\psi(k)= \begin{cases} \frac{f\left(g(a)+k\right)-f\left(g(a)\right)}{k} & k \neq 0 \\ f'\left(g(a)\right)& k = 0 \end{cases}$$

For the $k \neq 0$ condition, we recognize this as the expression that we would to the right of the $\displaystyle \lim_{k \to 0}$ in the definition of the derivative of $f$ at $g(a)$. By assumption, we know that the derivative of $f$ at $g(a)$ exists. With this information, we can actually prove that $\psi$ is continuous at $0$, which means that $\displaystyle \lim_{k \to 0}\psi(k)=\psi(0)=f'\left(g(a)\right). \quad \dagger$

Next, consider the function $\omega$ which we will define as follows:

$$\omega(h)=g(h+a)-g(a) \text{ for any }h\in \mathbb R$$

By assumption, $g$ is differentiable at $a$...which means $g$ is also continuous at $a$. With this information, we can prove that $\displaystyle \lim_{h \to 0} \omega(h)=0. \quad \dagger \dagger$

Notice how using $\dagger$ and $\dagger \dagger$, we can invoke our lemma which states:

$$\displaystyle \lim_{h \to 0}\psi\left( \omega(h) \right)=\psi(0)=f'\left(g(a)\right)$$

We can unpack $\psi\left ( \omega(h)\right)$ in the following way:

$$\psi\left ( \omega(h)\right)=\psi \left( g(h+a)-g(a)\right)$$

By definition of $\psi$, if $g(h+a) - g(a) = 0$, then $\psi \left (g(h+a) - g(a) \right)=f'\left(g(a)\right) $

If $g(h+a) - g(a) \neq 0$, then $\psi \left (g(h+a) - g(a) \right)=\frac{f\left(g(a)+g(h+a) - g(a)\right)-f\left(g(a)\right)}{g(h+a) - g(a)} = \frac{f\left(g(h+a)\right)-f\left(g(a)\right)}{g(h+a) - g(a)}$.

This is precisely how Spivak defined his $\phi$ function.