Apostol calculus I page 174-175 has the proof of chain rule.
Theorem states: Let f be the composition of two functions u and v, say $f=u \circ v$. Suppose that both derivatives $v'(x)$ and $u'(y)$ exist, where $y=v(x)$. Then derivative $f'(x)$ also exists and is given by the formula $f'(x)=u'(y).v'(x)$.
Proof: The difference quotient for f is (4.12): $\frac{f(x+h)-f(x)}{h}=\frac{u[v(x+h)]-u[v(x)]}{h}$ . Let $y=v(x)$ and let $k=v(x+h)-v(x)$. Then we have $v(x+h)=y+k$ and (4.12) becomes (4.13): $\frac{f(x+h)-f(x)}{h}=\frac{u(y+k)-u(y)}{h}$ .
If $k\neq0$,then we multiply and divide by k and obtain (4.14): $\frac{u(y+k)-u(y)}{h}\frac{k}{k}=\frac{u(y+k)-u(y)}{k}\frac{v(x+h)-v(x)}{h}$. When h goes to 0, last quotient on right becomes $v'(x)$. Also, as $h$ goes to $0$, $k$ also goes to $0$ because $k=v(x+h)-v(x)$ and $v$ is continuous at $x$. Therefore the first quotient on the right approaches $u'(y)$ as $h$ tends to zero and this proves the result. $\square$
Although the foregoing argument seems to be the most natural way to proceed, it is not completely general. Since $k=v(x+h)-v(x)$, it may happen that $k=0$ for infinitely many values of $h$ as $h$ tends to zero in which case the passage from (4.13) to (4.14) is not valid.
My doubt: I have trouble understanding the line "it may happen that $k=0$ for infinitely many values of $h$ as $h$ tends to zero" What is this line trying to convey and why is the proof incorrect?
Thanks in advance.
Apostol has in mind functions like the topologist's sine curve $$t(x) = \sin \left( \frac1x \right)$$ While this function itself is not differentiable at zero, so it is not problematic for the chain rule proof, its weird cousin $$f(x) = e^{-\frac1{x^2}} \sin \left( \frac1x \right)$$ is differentiable everywhere (though it is not analytic at $x=0$).
So the question becomes, "does the chain rule apply if one of the functions is a weird function such as $f(x)$?"
Not a very practical worry, but if you present a "proof" it is always best that the proof be airtight.
Added afterward
Let $g(x) = \frac1{x+1}$. Then $$(f\circ g)(x) = e^{-(1+x)^2}\sin(1+x) \\\frac{d(f\circ g)(x))}{dx} = e^{-(1+x)^2}\cos(1+x)-2e^{-(1+x)^2}(1+x)\sin(1+x)\\ \left.\frac{d(f\circ g)(x))}{dx}\right|_{x=0} = \frac{\cos(1)-2\sin(1)}{e} \approx -0.42 \neq 0 $$ But applying the chain rule, and noting that the derivative at zero of $f(x)$ is zero,
$$ \left.\frac{df(x)}{dx}\right|_{x=0} = 0 \\ \left.\frac{dg(x)}{dx}\right|_{x=0} = -1 \frac{d(f\circ g)(x))}{dx} = 0\cdot (-1) = 0 $$
But the combination of $f$ and $g$ is not a counterexample to the chain rule, because the chain rule requires taking the derivative of $f$ at $g(x)$ and $g(0)$ is not zero.
Turns out the conditions stated in Apostol are in fact sufficient; as long as the functions are differentiable, at $g(x)$ and $x$ respectively, the chain rule works.