In my physics class my professor was abusing the derivative, as per so many physics classes I've been in. This time, he took the quantity $(x+dx)(y+dy)$ and argued that the $dxdy$ term should disappear, because it's so much smaller than the rest, (despite $dx, dy$ both being infinitesimal...). In any case, I know this is related to non-standard analysis, or something of the sort, and I was wondering if someone could explain in whatever light is proper, why the product of two infinitesimals can be said to be zero. With whatever wonderfully terrible mathematical rigor that is required.
Products of Infinitesimals
2.6k Views Asked by user82004 https://math.techqa.club/user/user82004/detail AtThere are 7 best solutions below
On
In nonstandard analysis one can define derivatives without using limits: if $dx$ is an infinitesimal, that is, a number greater than zero but less than every positive real number, then $f'(x)$ can almost be computed as $[f(x+dx)-f(x)]/dx$. To get the same result as in standard analysis, one then takes the "standard part" of this, the closest real number, which amounts to the throwing away of higher-order infinitesimals that your physics professor did.
Here are two explicit examples. Let's compute the derivative of $f(x)=x^2$. Let $dx$ be infinitesimal. Then $f(x+dx)-f(x)=x^2+2xdx+(dx)^2-x^2=2xdx+(dx)^2$. Dividing by $dx$ we get $2x+dx$. For $x$ a real number it's hopefully intuitive that the standard part of $2x+dx$ is $2x$, and so we get our familiar identity $f'(x)=2x$.
Now let's look at the product rule, which is the sort of situation in which your professor's argument might come up. We have $$(fg)'(x)dx\approx fg(x+dx)-fg(x)=$$$$[f(x)+f'(x)dx+c_1(dx)^2][g(x)+g'(x)dx+c_2(dx)^2]-fg(x)=(f'g+g'f)dx+c_3(dx)^2$$ Here we're using Taylor's theorem to expand $f$ and $g$-in the familiar context we say the $c_i$ don't go to infinity as $dx\to 0$, which in the nonstandard context is just to say the $c_i$ are not infinite for infinitesimal $dx$.
So here the $(dx)^2$ term will disappear, as your professor suggested, when we take the standard part of the derivative. But this only makes sense after we've subtracted $fg(x)$! Then we're justified in cutting off at the standard, or real, part of our expression-saying $(x+dx)(y+dy)=xy+ydx+xdx$ is rather arbitrary, in comparison.
Anyway, this discussion requires justifying the existence of infinitesimals, and our ability to compute with them as we do with reals, even applying Taylor's theorem to them. The full justification of this theory involves understanding a couple of logical topics: first-order predicate logic and ultraproducts. These aren't overwhelmingly technical, but have little to do with how the theory is used. For that, it's enough to know the
Transfer Principle All the same things are true of the extended reals with infinitesimals as of the standard reals that can be stated without saying "For every subset of $\mathbb{R}$..." or something equivalent.
(With apologies for the lack of precision in this statement-I hope it gets the point across.) Being careful with the transfer principle is probably where nonstandard analysis wins out over informal physical reasoning, that is, it lets us decide exactly when this sort of argument is reasonable. Specific examples are that the nonstandard reals and differentiable functions on them do satisfy the intermediate value theorem and Taylor's theorem but do not satisfy the least upper bound property.
On
Let me give, as an example, the derivation of the product rule.
Informal version:
\begin{align}d(uv)&=\left((u+du)(v+dv)\right)-\left(uv\right)\\ d(uv)&=u\,dv+v\,du+du\,dv\end{align} We let the $du\,dv$ term disappear: \begin{align}d(uv)&=u\,dv+v\,du\\ \frac{d(uv)}{dx}&=u\frac{dv}{dx}+v\frac{du}{dx} \end{align}
Formal version:
For functions, $\Delta f$ means $f(x+\Delta x)-f(x)$ where $\Delta x$ is a small, but not infinitesimal, quantity. Thus, $\lim_{\Delta x\to0}\frac{\Delta f}{\Delta x}=\frac{df}{dx}$.
\begin{align}\Delta(uv)&=\left((u+\Delta u)(v+\Delta v)\right)-\left(uv\right)\\
\Delta(uv)&=u\Delta v+v\Delta u+\Delta u\Delta v\\
\frac{\Delta(uv)}{\Delta x}&=u\frac{\Delta v}{\Delta x}+v\frac{\Delta u}{\Delta x}+\frac{\Delta u}{\Delta x}\Delta v\end{align}
Take the limit as $\Delta x$ goes to zero. Notice how the term on the right vanishes.
\begin{align}\frac{d(uv)}{dx}&=u\frac{dv}{dx}+v\frac{du}{dx}\end{align}
$\ $
So, the reason we could have treated the $du\,dv$ as zero, is because later, we only divided by $dx$ (so it became zero when we took the limit). If we were to later divide by $dx^2$, it would not become zero - however, we usually only divide by $dx$ once.
On
One way of thinking about this is using a parameter $\epsilon$ as $\epsilon \to 0$. If $dx = O(\epsilon)$ and $dy = O(\epsilon)$ while $x$ and $y$ do not depend on $\epsilon$, then $dx\; dy = O(\epsilon^2)$, so it's correct to say
$$ (x + dx)(y + dy) = xy + x\; dy + y\; dx + O(\epsilon^2)$$
And this can be manipulated further, perfectly rigourously, using the standard rules of Big O notation
On
There are several perfectly rigorous ways to formalize this kind of reasoning, none of which require any nonstandard analysis (which you should be quite suspicious of as it relies on a weak choice principle to even get off the ground).
One of them is, as Robert Israel says, interpreting statements about infinitesimals as statements about limiting behavior as some parameter tends to zero. For example, you can define what it means for a function $f(x)$ to be differentiable at a point: it means there is some real number $f'(x)$ such that (in little-o notation)
$$f(x + \epsilon) = f(x) + f'(x) \epsilon + o(|\epsilon|)$$
as $\epsilon \to 0$. After you prove some basic lemmas about how little-o notation works, you get some very clean and intuitive proofs of basic facts in calculus this way. For example, here's the product rule:
$$\begin{eqnarray*} f(x + \epsilon) g(x + \epsilon) &=& \left( f(x) + f'(x) \epsilon + o(|\epsilon|) \right) \left( g(x) + g'(x) \epsilon + o(|\epsilon|) \right) \\ &=& f(x) g(x) + (f'(x) g(x) + f(x) g'(x)) \epsilon + o(|\epsilon|). \end{eqnarray*}$$
After writing down a bunch of arguments like this, if you're familiar with elementary ring theory it becomes very tempting to think of expressions that are $o(|\epsilon|)$ (meaning they grow more slowly than $|\epsilon|$ as $\epsilon \to 0$) as an ideal that you can quotient out by, and this intuition can also be formalized.
More precisely, in the ring $R = C^{\infty}(\mathbb{R})$ of smooth functions on $\mathbb{R}$, for any $r \in \mathbb{R}$ there's an ideal $(x - r)$ generated by the function $x$, consisting of all functions vanishing at $r$. Working in the quotient ring $R/(x - r)$ amounts to only working with the value at $r$ of a function. Working in the quotient ring $R/(x - r)^2$, though, amounts to working with both the value at $r$ and the first derivative at $r$, with multiplication given by the product rule. Similarly, working in $R/(x - r)^{n+1}$ amounts to working with the value at $r$ and the first $n$ derivatives at $r$.
Taking ideas like this seriously leads to things like formal power series, germs of functions, stalks of sheaves, jet bundles, etc. etc. It is all perfectly rigorous mathematics, and nonstandard analysis is a huge distraction from the real issues.
On
$$ \frac{\Big(x + x\,\Delta y + y\,\Delta x + \Delta y\,\Delta x\Big) - x }{\Delta t} = \underbrace{x\frac{\Delta y}{\Delta t} + y \frac{\Delta x}{\Delta t}}_A + \underbrace{\frac{\Delta y\,\Delta x}{\Delta t}}_B $$ $$ \overbrace{\frac{\Delta y\,\Delta x}{\Delta t} = \frac{\Delta y}{\Delta t}\Delta x = \frac{\Delta x}{\Delta t}\Delta y}^B $$ The expression labeled $B$ approaches $0$ since $\dfrac{\Delta y}{\Delta t}$ approaches a finite number and it is then multiplied by $\Delta x$, which approaches $0$. And similarly for the last term above.
On
The best explanation of the step converting $xdy+ydx+dxdx$ to the expression $xdy+ydx$ is still Leibniz's in terms of a generalized relation of equality he sometimes denoted by a symbol similar to "$\,{}_{\ulcorner\!\urcorner}\,$". Here $a\,{}_{\ulcorner\!\urcorner}\,b$ means that $\frac{a}{b}$ is infinitely close to $1$. Thus we have simply $$xdy+ydx+dxdx\;\;{}_{\ulcorner\!\urcorner}\;\;xdy+ydx$$ A similar formula holds when calculating the derivative of $y=x^2$, obtaining $\frac{dy}{dx}\;{}_{\ulcorner\!\urcorner}\;2x$.
No need for either little-o or big-o. Leibniz was explicit in describing the relation he was using as more general than equality, as a relation "up to" a negligible term. He wrote this, in particular, in a published article in 1695.
Here the point is to choose what Leibniz called an "assignable" value as the value for the derivative. Here $2x$ is assignable but $dx$ (as well as $2x+dx$) is what Leibniz refers to as inassignable. Leibniz explicitly described his infinitesimals as "inassignable".
Answer left here as an example to stop others from using the same incorrect reasoning. Incorrect reasoning is as follows:
I think he is arguing something along these lines:
If $dx$ and $dy$ are infinitesimal, then $(dx)^2$, $(dy)^2$ and $dxdy$ are an order of magnitude even smaller and are hence negligible compared to $dx$ and $dy$.
e.g. $0.001$ and $0.002$ are very small numbers but their product ($0.001*0.002=0.000002$) is negligible compared to either of them.
Problem with reasoning: If dydx is negligible compared with dx, and can be disregarded, then dy is negligible compared with y, and can also be disregarded. I suspect the idea is that although dy isn't negligible when compared with x or y, dxdy is negligible compared with x and y, and can be disregarded in computations involving those.