My understanding of derivatives is that:
$$f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}$$
Where the limit is defined with the usual $\epsilon$-$\delta$ style statement with first order logic.
And so $\frac{df(x)}{dx} = f'(x)$, as per usual.
This doesn't work so well when people start talking about $\frac{df'(x)}{df(x)}$.
In the special case where $y = f(x)$ is invertible we can rephrase this with the chain rule :
Let $g(y) = f^{-1}(y) = x$, then $\frac{df'(x)}{df(x)}$ is:
$$\begin{align*} \frac{d}{dy}\left( f'(g(y)) \right) &= f''(g(y)) \cdot \frac{d}{dy}( g(y)) \\ &= f''(x) \cdot g'(y) \\ &= f''(x) \cdot g'(f(x)) \\ &= f''(x) \cdot \left( \frac{d}{dx}\left(g(f(x))\right) \cdot \frac{1}{f'(x)} \right) \text{ by chain rule } g'(h(x)) = \frac{d}{dx}( g(h(x)) ) \cdot \frac{1}{h'(x)} \\ &= f''(x) \cdot \left( \frac{d}{dx}\left( x\right) \cdot \frac{1}{f'(x)} \right) \text{ since $g(f(x)) = f^{-1}(f(x))$}\\ &= \frac{f''(x)}{f'(x)} \end{align*}$$
It is asserted by many on MSE that: $\frac{df'(x)}{df(x)} = \frac{f''(x)}{f'(x)}$ in general.
However I can't seem to make sense of this in terms of the usual $\epsilon-\delta$ definitions.
This leads me to think that there are multiple notions of derivatives:
- The ordinary derivative defined with $\epsilon-\delta$.
- The notion of a differential, which builds on top of the ordinary derivative.
Here $df_x(t) = f'(x) \cdot t$, where $d$ operates on a function. Thus being capable of proving the above derivative trivially: $$\frac{df'_x(t)}{df_x(t)} = \frac{f''(x)}{f'(x)}$$
I suspect this is what many of the answers are using, and this what people mean when they say "using Leibniz notation"?
My question is the following:
- Is it possible to prove the general case without using the notion of differentials?
- Is it wrong in thinking there are multiple notions of derivatives and that "differentiating with respect to a function" is not the same thing as the ordinary $\epsilon-\delta$-based derivative?
Edit: Here are some MSE answers which claim this is true in general:
- Why is $\frac{dy'}{dy}$ zero, since y' depends on y?
- Simplifying $\frac{dy'}{dy}$ where $y=f(x)$
- Derivative of a function with respect to another function.
- differentiate with respect to a function
- What is $\frac{d}{dx}\left(\frac{dx}{dt}\right)$?
- Circular Motion
- Showing $\ddot{x} = \frac{\mathrm{d}}{\mathrm{d}x}(\frac{1}{2} \dot{x}^2)$
- Is there a way to rigorously define "taking the derivative with respect to a function"
- Derivative with respect to another function
- Taking a derivative of a function with respect to another function
In the first place, what does $\frac{\mathrm{d} f' (x)}{\mathrm{d} f (x)}$ mean? Clearly, it suffices to define what $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ means: once we know that, we can simply substitute $f'$ for $g$. We should also define it in such a way that when $f$ is the identity function – i.e. $f (x) = x$ – then $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ has the same meaning as $\frac{\mathrm{d} g (x)}{\mathrm{d} x}$. There is an obvious choice: $$\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)} = \lim_{h \to 0} \frac{g (x + h) - g (x)}{f (x + h) - f (x)}$$ Since $f (x + h) - f (x)$ could be $0$ for $h \ne 0$ we should be a little bit more careful, so let us say that the value of $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ at $x = x_0$ is $M$ if, for all $\epsilon > 0$, there exists $\delta > 0$ such that for all $h$ such that $0 < \left| h \right| < \delta$, $$\left| g (x_0 + h) - g (x_0) - M \cdot (f (x_0 + h) - f (x_0)) \right| < \epsilon \cdot \left| f (x_0 + h) - f (x_0) \right|$$
(This definition generalises straightforwardly to the vector-valued multivariable case, provided we understand $M$ needs to be a matrix of the appropriate dimensions.) Since both the left and right hand side are non-negative, if $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ has a value at $x = x_0$, then there exists $\delta > 0$ such that for all $h$ such that $0 < \left| h \right| < \delta$, $\left| f (x_0 + h) - f (x_0) \right| > 0$, i.e. $f$ is not constant on any neighbourhood of $x_0$.
Now, with all that preamble out of the way, let me state:
Theorem. If $f (x)$ and $g (x)$ are differentiable at $x = x_0$, with $f' (x_0)$ and $g' (x_0)$ as the values of $\frac{\mathrm{d} f (x)}{\mathrm{d} x}$ and $\frac{\mathrm{d} g (x)}{\mathrm{d} x}$ at $x = x_0$ respectively, and $f' (x_0) \ne 0$, then $\frac{\mathrm{d} g (x)}{\mathrm{d} f (x)}$ has value $\frac{g' (x_0)}{f' (x_0)}$ at $x = x_0$.
Proof. Let $0 < \epsilon < 1$. By hypothesis, there exists $\delta_1 > 0$ such that for all $h$ such that $0 < \left| h \right| < \delta_1$, $$\left| f (x_0 + h) - f (x_0) - f' (x_0) \cdot h \right| < \frac{1}{3} \epsilon \cdot \left| h \right| \cdot \frac{\min \left\{ \left| f' (x_0) \right|, \left| f' (x_0) \right|^2 \right\}}{\max \left\{ 1, \left| g' (x_0) \right| \right\}}$$ (Replace $\epsilon$ with $\frac{1}{3} \epsilon \cdot \frac{\min \left\{ \left| f' (x_0) \right|, \left| f' (x_0) \right|^2 \right\}}{\max \left\{ 1, \left| g' (x_0) \right| \right\}}$ in the definition.) We then have: $$\left| f (x_0 + h) - f (x_0) - f' (x_0) \cdot h \right| < \frac{1}{3} \left| f' (x_0) \cdot h \right|$$ $$\left| \frac{g' (x_0)}{f' (x_0)} \right| \cdot \left| f (x_0 + h) - f (x_0) - f' (x_0) \cdot h \right| < \frac{1}{3} \epsilon \cdot \left| f' (x_0) \cdot h \right|$$
Similarly, by hypothesis, there exists $\delta_2 > 0$ such that for all $h$ such that $0 < \left| h \right| < \delta_2$, $$\left| g (x_0 + h) - g (x_0) - g' (x_0) \cdot h \right| < \frac{1}{3} \epsilon \cdot \left| f' (x_0) \cdot h \right|$$ (Replace $\epsilon$ with $\frac{1}{3} \epsilon \cdot \left| f' (x_0) \right|$ in the definition.)
Let $\delta = \min \{ \delta_1, \delta_2, 1 \}$. Then, for all $h$ such that $0 < \left| h \right| < \delta$: $$\begin{multline} \left| g (x_0 + h) - g (x_0) - \frac{g' (x_0)}{f' (x_0)} \cdot ( f (x_0 + h) - f (x_0) ) \right| \\ \le \left| g (x_0 + h) - g (x_0) - \frac{g' (x_0)}{f' (x_0)} \cdot f' (x_0) \cdot h \right| + \left| \frac{g' (x_0)}{f' (x_0)} \cdot ( f (x_0 + h) - f (x_0) - f' (x_0) \cdot h ) \right| \end{multline}$$ The first term is $< \frac{1}{3} \epsilon \cdot \left| f' (x_0) \cdot h \right|$. The second term is also $< \frac{1}{3} \epsilon \cdot \left| f' (x_0) \cdot h \right|$. Thus the LHS is $< \frac{2}{3} \epsilon \cdot \left| f' (x_0) \cdot h \right|$. But, $$\begin{multline} \left| f' (x_0) \cdot h \right| \le \left| f (x_0 + h) - f (x_0) \right| + \left| f (x_0 + h) - f (x_0) - f' (x_0) \cdot h \right| \\ < \left| f (x_0 + h) - f (x_0) \right| + \frac{1}{3} \left| f' (x_0) \cdot h \right| \end{multline}$$ so $\left| f' (x_0) \cdot h \right| < \frac{3}{2} \left| f (x_0 + h) - f (x_0) \right|$. Therefore, $$\left| g (x_0 + h) - g (x_0) - \frac{g' (x_0)}{f' (x_0)} \cdot ( f (x_0 + h) - f (x_0) ) \right| < \epsilon \cdot \left| f (x_0 + h) - f (x_0) \right|$$ as required. ◼
I also beg to differ with those who say that things like "independent variable" are meaningless conversational filler. Pure mathematicians – some of us anyway – know how to make this rigorous. The trick is to recognise the concept of context and make it a concrete thing. In probability theory, this is the purpose of the sample space. We can do the same for basic analysis... but this kind of formalisation is usually not helpful for early students, so we do not teach it.