Challenge: Demonstrate a Contradiction in Leibniz' differential notation

3.6k Views Asked by At

I want to know if the Leibniz differential notation actually leads to contradictions - I am starting to think it does not.

And just to eliminate the most commonly showcased 'difficulty':

For the level curve $f(x,y)=0$ in the plane we have $$\frac{dy}{dx}=-\frac{\dfrac{\partial f}{\partial x}}{\dfrac{\partial f}{\partial y}}$$ If we were to "cancel" the differentials we would incorrectly derive $\frac{dy}{dx}=-\frac{dy}{dx}$. Why does this not work? Simple: The "$\partial f$" in the numerator is a response to the change in $x$, whereas the "$\partial f$" in the denominator is a response to the change in $y$. They are different numbers, and so cannot be cancelled.

Related: consult the answer to this previous question.


The other part has been moved to a new post here.

10

There are 10 best solutions below

16
On BEST ANSWER

As you suggest in your own question, there is in fact no contradiction in Leibniz's notation, contrary to persistent popular belief. Of course, one needs to distinguish carefully between partial derivatives and derivatives in the notation, as you did. On an even more basic level, the famous "inconsistency" of working your way from $y=x^2$ to $dy=2xdx$ is handled successfully by Leibniz who is aware of the fact that he is working with a generalized notion of "equality up to" rather than equality "on the nose". These issues were studied in detail in this recent study.

The formula $\frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}$ holds so long as we assign to the independent variable $du$ in the denominator of $\frac{dy}{du}$ the same value as that given by the dependent variable $du$ in the numerator of $\frac{du}{dx}$. On the other hand, if as is usual one uses constant differentials $du$ in computing $\frac{dy}{du}$ the formula will be incorrect. In each instance one has to be careful about the meaning one assigns to the variables, as elsewhere in mathematics. For details see Keisler.

The OP reformulated his question in subsequent comments as wishing to understand how Leibniz himself viewed his theory and why he believed it works. This seems like a tall task but it so happens that there is a satisfactory answer to it in the literature. Namely, while Leibniz was obviously unfamiliar with the ontological set-theoretic material we take for granted today, he had a rather clear vision of the procedural aspects of his calculus, and moreover clearly articulated them unbeknownst to many historians writing today. The particular paradox of the differential ratio $\frac{dy}{dx}$ being apparently not equal on the nose to what we expect, e.g., $2x$ (which in particular undermines the "tautological" proof of the chain rule in one variable) was explained by Leibniz in terms of his transcendental law of homogeneity. On Leibniz see article1 and article2.

The consistency of Leibniz's law is demonstated in the context of modern set-theoretic assumptions in terms of the standard part principle.

13
On

Leibniz notation for the second derivative suggests a version of the chain rule: $$ \frac{d^2y}{du^2}\left(\frac{du}{dv}\right)^2=\frac{d^2y}{dv^2}. $$ This does not hold in general: for example $y=u=v^2$.

8
On

The gist of the OP's explanation of why the "cancellation" of $\partial f$'s should not be allowed (and does not work) is correct, but something more can be said.

The partial derivative $\partial f/\partial x$ is the rate at which $f$ changes with respect to change in $x$, but while holding y constant. Similarly the definition of $\partial f/\partial y$ entails a rate of change while holding $x$ constant.

Manipulation of the $dx$ and $dy$ symbols separately (rather than as an ordinary derivative $dy/dx$) produces sensible results:

$$ \frac{\partial f}{\partial y} \;dy + \frac{\partial f}{\partial x} \;dx = 0 $$

which accords with the underlying premise that $x,y$ are restricted to a level curve:

$$ f(x,y) = \text{constant} $$

This sensible computation, despite appearing as a superficial manipulation of symbols, is taught in freshman calculus as implicit differentiation, so it bears consideration why this should be allowed, while "cancelling" $\partial f$'s should not. There is the hidden premise that $x$ is being kept constant when taking as a limit $\partial f/\partial y$, and similarly holding $y$ constants when taking $\partial f/\partial x$. Combining a change in $x$ with one in $y$ is then properly done by implicit differentiation, subjecting their mutual changes to a constraint that $f$ is being kept "level".

Added: A good notation is useful at least as much for what it hides/suppresses from its definition as for what it suggestively expresses. If a notation is soundly defined, any contradiction that arises from proper use has to be blamed on the underlying theory, rather than the notation itself.

Of course in hiding some parts of the definition, a notation lends itself to "abuse". As we see above thinking of derivatives as "fractions" literally is suggested by the notation, and sometimes "allowable", sometimes not.

A related pitfall having to do with first partial derivatives is their commutativity. We all "know" that under mild smoothness assumptions:

$$ \partial (\partial f /\partial x)/\partial y = \partial (\partial f /\partial y)/\partial x $$

However this depends on the pair $x,y$ being independent variables (holding one fixed while varying the other). I once tried to commute first partials while mixing Cartesian and polar coordinates in teaching a class, and promptly got a contradiction!

Consider for example the polynomial $f = x^2 + y^2 = r^2$ in both Cartesian and polar coordinates. Now $\partial (\partial f/\partial \theta)\partial x$ is identically zero, but $\partial (\partial f/\partial x)\partial \theta$ is not!

Fortunately I was able to learn from my mistake (not sure how much the students benefitted other than from entertainment value), and later it helped me appreciate why shape function derivatives do not commute in general.

So even when notations suggestively lead us astray, there may be a good lesson to be found.

2
On

Notation can be closely associated with contradictions. A good historical example comes from the work of Nieuwentijt who was a contemporary of Leibniz's. Nieuwentijt criticized Leibniz's approach and proposed his own notation where there are only first-order infinitesimals $\frac{r}{\infty}$ (where $r$ is an ordinary number), whereas the product of two such is postulated to vanish. He wrote a book based on this approach. Nieuwentijt's notation works for some simple calculus problems treated in his book. As we know in retrospect, it cannot be a basis for calculus, because it violates the Leibnizian law of continuity: whatever succeeds for the finite, succeeds also for the infinite, and vice versa. Namely, applying the usual rules of algebra to Nieuwentijt's system one quickly runs into contradictions.

An alleged "contradiction" in Leibniz that has been persistently reported in the literature since at least 1734 (date of publication of George Berkeley's The Analyst) is the contention that $dx$ is assumed to be nonzero at the beginning of the calculation, and zero at the end of the calculation, as for example when one wishes to write $2x+dx=2x$ at the end of the calculation for $y=x^2$. The alleged logical inconsistency can be summarized in modern notation as follows: $(dx\not=0)\wedge(dx=0)$.

This persistent claim of logical inconsistency however has no basis as Leibniz clearly and repeatedly stated in his writings that he is working with a generalized relation of "equality up to" (rather than an equality "on the nose"). In particular, Leibniz never wrote or implied that $dx=0$, contrary to Berkeley's contention. This issue was dealt with in detail in a recent article in Erkenntnis here.

Thus the alleged logical inconsistency occurs only in attacks of the critics who have misunderstood Leibniz rather than in Leibniz's work itself.

7
On

It is easy to see that first differential does not changes with shift of the variable: If $y$ is a function of $x$, then $dy = y'(x)dx$. Treating them as functions of $t$ does not break it: $dy = y'(t)dt=y'(x)x'(t)dt=y'(x)dx$. Things like $\frac{dy}{dx}=\frac{1}{\frac{dx}{dy}}$ also holds. But that's wrong for second differential and higher: $d^2y=y''(x)dx^2$, if $x$ is independent variable. But in general, for functions of $t$, $d^2y=y''(x)dx^2+y'(x)d^2x$. So the problem is crated by an assumption $d^2x=0$. For the functions of many variables, we've got similar situation, but remember that if, for example, $u=f(x,y,z)$, then $du=f'(x)dx+f'(y)dy+f'(z)dz$ while parial derivatives themselves could not be interpreted as ratios of differentials. To sum up, differential notation is not contradictory in terms of first derivatives, but must not be used in proofs of deriving rules, because the chain rule is needed to explain this property.

22
On

The discussion here has been quite interesting! I wrote about Leibniz's notation in my Bachelor's Thesis in 2010 reading through major parts of Bos's 1974 PhD on higher order differentials in the Leibnizian calculus. I believe Bos is wrong at one point. Assuming one variable in the by Bos so-called arithmetic progression is never necessary - only convenient! I will answer to that below.

Leibniz's differentials

Leibniz developed his differentials, at first, from a geometrical intuition - although he reconsidered the actuality of this idea time and again. In my words, this idea can be very briefly summarized as:

A curve can be thought of as a polygon with infinitely many infinitely small sides $ds$. Each $ds$ is an infinitesimally small straight line segment being a part of the curve and (paradoxically) tangent to it at the same time. Gathering the $ds$ to one straight line segment $s=\int ds$ this will constitute the length of the curve. Expressing such a curve by a geometrical relation between coordinate line segments $x$ and $y$ one could consider each $ds$ as hypotenuse of a right triangle with legs $dx$ and $dy$ so that $dx^2+dy^2=ds^2$.

This is only to say that $dx,dy$ and $ds$ was thought of as geometrical and mutually dependend entities - never considered just numbers like we allow functions to be today.

Just to stress how geometrical: the function nowadays expressed by the formula $f(x)=x^2$ would be something like $a\cdot y=x\cdot x$ where $a,y$ and $x$ where all considered line segments so that both sides of the equation would constitute an area in Leibniz's time.

The level curve example

In the fractions $\frac{\partial f}{dx}$ and $\frac{\partial f}{dy}$ the $\partial f$'s in the two fractions are unrelated because:

  • We do not have $\partial f,\partial x$ and $\partial y$ mutually dependend geometrical entities due to the reason you already gave that the first $\partial f$ is the change in $f$ when you move in the $x$-direction by the vector $(dx,0)$ whereas the second $\partial f$ corresponds to moving by the vector $(0,dy)$. So they are unequal although infinitesimally small ...
  • Even if we had some $df$ mutually dependend to $dx$ and $dy$ this would naturally have to be the change in $f$ when you travel the vector $(dx,dy)$ and thus different from the $\partial f$'s described before.

The chain rule example

Since we consider higher order differentials the work of Bos is relavant here: Had there been such thing as a derivative $z=\frac{dy}{dv}$ in Leibniz's time, the differential of that should read $$ dz=d\frac{dy}{dv}=\frac{dy+ddy}{dv+ddv}-\frac{dy}{dv}=\frac{dv\ ddy-dy\ ddv}{dv(dv+ddv)} $$ Now, since $ddv$ is infinitesimally small compared to $dv$ we may skip $ddv$ in the bracket and simply write $dv$ instead of $(dv+ddv)$. Therefore we have $$ \frac{dz}{dv}=\frac{dv\ ddy-dy\ ddv}{dv^3}=\frac{ddy}{dv^2}-\frac{dy\ ddv}{dv^3} $$ Note that $ddy$ can also be written as $d^2 y$. So the second order derivative of $y$ with respect to $v$ equals $\frac{d^2 y}{dv^2}$ minus some weird fraction $\frac{dy\ d^2 v}{dv^3}$ which can only be disregarded if it is zero. This only happens if either $dy=0$ or $d^2 v=0$. Choosing $d^2 v$ identical zero does the trick and renders $dv$ constant.

Suppose now that $d^2 v\equiv 0$. Then for the example $y=u=v^2$ we see that $du=2v\ dv$ and furthermore $ddu=2v\ ddv+2\ dv^2=2\ dv^2$ where the last equality is due to our choice that $ddv$ is identical zero. Therefore we see that the derivative of $w=\frac{dy}{du}$ will be given as $$ \frac{dw}{du}=\frac{d^2 y}{du^2}-\frac{dy\ ddu}{du^3} $$ where the last fraction is far from being zero as it may be rewritten - noting that $y=u\implies dy=du$ and that $\frac{dv}{du}=\frac{1}{2v}$ - to obtain $$ \require{cancel} \frac{\cancel{dy}\ ddu}{\cancel{du}\cdot du^2}=\frac{2\ dv^2}{du^2}=\frac{1}{2v^2} $$ This shows that assuming $\frac{d^2 y}{dv^2}$ to be the second order derivative of $y=v^2$ with respect to $v$ in the modern sense makes $\frac{d^2 y}{du^2}$ differ by $\frac{1}{2v^2}$ from being the second order derivative of $y=u$ with respect to $u$. Now since we know that $y=u$ we have $w=\frac{dy}{du}=1$ and thus $\frac{dw}{du}=0$. Therefore we must have $$ \frac{d^2 y}{du^2}-\frac{1}{2v^2}=0 $$ in this case showing that $\frac{d^2 y}{du^2}=\frac{1}{2v^2}$. So with the choice $y=u=v^2$ and $ddv\equiv 0$ the equation $$ \frac{d^2 y}{du^2}\cdot\left(\frac{du}{dv}\right)^2=\frac{d^2 y}{dv^2} $$ may be successfully checked applying that $\frac{du}{dv}=2v$ since we then have $$ \frac{1}{2v^2}\cdot(2v)^2=2 $$ which is actually true. This is NOT a coincidence!

Conclusion

The above calculations show that Julian Rosen's very appealing example of failure in the method of the Leibnizian calculus seems to be a misunderstanding about what is meant by the notions of $d^2 y$ and the hidden, but important, additional variables $ddv$ and $ddu$. This provides specific details regarding the comments given by user72694 below the answer from Julian.

However, proving that Leibniz's notation will never produce false conclusions when handled correctly is a whole different story. This is supposedly what Robinson managed to do, but I must admit that I have not read and understood that theory myself.

My Bachelor's thesis focused mainly on understanding how the method was applied by Leibniz and his contemporaries. I have often times thought about the foundations, but mainly from a 17th century perspective.

Comment on Bos's work

On page 31 in his thesis, Bos argues that the limit $$ \lim_{h_1,h_2\rightarrow 0}\frac{[f(x+h_1+h_2)-f(x+h_1)]-[f(x+h_1)-f(x)]}{h_1 h_2} $$ only exists if $h_1=h_2$ which then makes this limit equal $f''(x)$. But that is in fact not entirely true. The $x$-differences $h_1$ and $h_2$ need not be equal. It suffices for them to converge to being equal which is a subtle, but important, variation of the setup. We must demand that $h_1$ and $h_2$ converge to zero in a mutually dependend fashion so that $$ \lim_{h_1,h_2\rightarrow 0}\frac{h_2}{h_1}=1 $$ With this setup the limit of the large fraction from before may still exist, but need not equal $f''(x)$. Since $h_1,h_2$ play the role of $dx$'s this is equivalent to allowing $dx_1\neq dx_2$ so that $ddx=dx_2-dx_1\neq 0$ although being infinitely smaller than the $dx$'s.

This means that it is in fact possible to imitate the historical notion of $dx$ being constant (and thereby $x$ in arithmetic progression) directly by modern limits.

Extras regarding the OP's answer

You are quite right that the differentials can be succesfully manipulated into the equation $$ \frac{d^2}{dv^2}\big(y(u(v))\big)=y''(u(v))\cdot u'(v)^2+y'(u(v))\cdot u''(v) $$ under the assumption that $ddv\equiv 0$.

There is, however, a more obvious and even less restrictive choice to leave the progressions of all three variables $u,v$ and $y$ unspecified, and yet to connect the notation in a meaningful way to modern standards:

Introduce a fourth variable $t$ in arithmetic progression (i.e. $ddt\equiv 0$). One could think of it as a time variable so that $u(t),v(t)$ and $y(t)$ are coordinate functions of some vector valued function. Then Julian Rosen's equation can be directly transformed to $$ \frac{\left(\frac{d^2 y}{dt^2}\right)}{\left(\frac{du^2}{dt^2}\right)}\cdot\left(\frac{\left(\frac{du}{dt}\right)}{\left(\frac{dv}{dt}\right)}\right)^2=\frac{\left(\frac{d^2 y}{dt^2}\right)}{\left(\frac{dv^2}{dt^2}\right)} $$ and since $dt$ is in arithmetic progression $y''(t)=\frac{d^2 y}{dt^2}$ so that this may be written in modern notation as $$ \frac{y''(t)}{u'(t)^2}\cdot\left(\frac{u'(t)}{v'(t)}\right)^2=\frac{y''(t)}{v'(t)^2} $$ which is easily verified to be correct. This is probably the simplest account, but it only uses but does not give a very clear example of the necessity of choosing the progression of the variables. I think my first account did that better.

8
On

Question Update


EDIT: Concerning the issues in this thread, I am wondering if anyone has further insights into why we are permitted to multiply through by $dt$ in, for example, $$\frac{dy}{dx}=\frac{dy/dt}{dx/dt}$$ We start with $dx$ an independent change, and $dy$ the corresponding perturbation in $y(x)$. We multiply through by (presumably) an independent change in $t$. Insofar as neither of the differentials in $x$ or $y$ result from the change in $t$, why are we justified in taking the two expressions to be derivatives $y'(t)$ and $x'(t)$?


As a few comments of mine (specifically, where @JulianRosen stands on his post, and some comments directed at @Sting regarding his post) have not been responded to with the bounty time running out, I am going to post an update on how I stand right now.

The main issue for me has revolved around dealing with the example $y=u=v^2$ for which the claim $$\frac{d^2y}{dv^2}=\frac{d^2y}{du^2}\left(\frac{du}{dv}\right)^2$$ appears to be false.

I initially thought that the claim was not meaningful in the Leibniz notation, because the $du$ on the far right is a dependent varaible and will therefore not return a constant change in the second derivative, explaining why the formula is wrong.

However, the recent post by @Sting has (tentatively) convinced me that in fact the issue is much more subtle. The formula is in fact true - it just means something different from what we take it to mean. Specifically, the term $d^2y$ refers to a second-order difference - that is, a change in the change in $y$ caused by altering some independent variable (in this case $v$) - and that is all it is, nothing more. Hence the fraction $$\frac{d^2y}{du^2}$$ is simply a ratio of two differentials, one a second-order difference and the other the square of a first-order difference. A prioiri then, the above expression has NOTHING to do with our notion of a "second derivative" (that is, the derivative of the derivative). We could perhaps show that the two are equal, but this cannot be assumed. And in fact, they are not equal! As @Sting explained, we can show that $$\frac{d\left(\frac{dy}{du}\right)}{du}=\frac{d^2y}{du^2}-\frac{dyd^2u}{du^3}$$ where the left side is, by definition, the actual true "second derivative", and the right side is an expression equal to that. Hence we find that the ratio $$\frac{d^2y}{du^2}=\frac{d\left(\frac{dy}{du}\right)}{du}+\frac{dyd^2u}{du^3}$$ is actually greater than the second derivative by the term on the right, and the product in the first formula at the very top becomes $$\frac{d^2y}{du^2}\left(\frac{du}{dv}\right)^2=\left(\frac{d\left(\frac{dy}{du}\right)}{du}+\frac{dyd^2u}{du^3}\right)\left(\frac{du}{dv}\right)^2=\frac{d\left(\frac{dy}{du}\right)}{du}\left(\frac{du}{dv}\right)^2+\frac{dyd^2u}{dudv^2}$$ Re-writing, we get the claim $$\frac{d^2y}{dv^2}=\frac{d\left(\frac{dy}{du}\right)}{du}\left(\frac{du}{dv}\right)^2+\left(\frac{dy}{du}\right)\frac{d^2u}{dv^2}$$ In modern notation, this says the following. Let $y=y(u)$ and $u=u(v)$. Since we can take $d^2v\equiv 0$ the two ratios involving this are true second derivatives (see the post by @Sting) and so the claim is $$\frac{d^2}{dv^2}\big(y(u(v))\big)=y''(u(v))\cdot u'(v)^2+y'(u(v))\cdot u''(v)$$ which, of course, is valid. In other words, the "false" equation which appears in @JulianRosen's post is true, but it's meaning is more complicated than I initially thought. The Leibniz differentials need to be interpreted as broadly as possible, simply representing $n$-th order differences, without assuming them to equal $n$-th order derivatives.


That is the best I can do to communicate the current state of my (somewhat tenuous) understanding of these issues. Please feel free to comment on this post to point out any mistakes or misunderstandings that present themselves. And thank you to everyone who has participated for this amazing discussion! I hope it will be more-or-less resolved by the time I must award the bounty - and unfortunately I can't give the bounty to multiple people, although several here deserve it :) Cheers!

3
On

alright i kinda not understand the question but consider the product rule for derivatives :

d(uv)/dx = du/dx*v + dv/dx*u

uv = dx/d( du/dx*v + dv/dx*u)

uv = indefinite integral( du/dx*v + dv/dx*u)

Since the inverse of d/dx is dx/d, therfore cannot lead to a contradiction because

If dx = 0 ,then d/dx is undefined, it follows that it has no inverse, which then represents that integral d(uv)/0 can be ANY constant, C, since the RATE OF CHANGE is undefined.

For simplicity , i assume, Leibniz defined d/dx = 0 , if d/dx is undefined , in this case dx = 0.

17
On

I think the question is ill-posed (or, what amounts to the same thing, makes some incorrect assumptions).

Notation does not lead to contradictions. Ever. In any discipline. That is because notation does not assert anything. Notation has no truth value. Notation consists of a set of symbols for recording things, and a set of rules for manipulating those symbols. The power of Leibniz's notations is precisely that when those rules are properly followed we end up with formulas that look like familiar fraction cancellation laws, etc., which makes them easier to remember and conceptually easier to understand. If Leibniz's notation is misused, then, yes, apparent contradictions can arise -- but that is not a flaw in the notation, but rather a flaw in those who misuse the notation.

Looking over all of the answers in this thread, including the example in the OP, you will find the same kind of dialectic over and over: "Some people write < formula >, which looks like a contradiction or inconsistency, but that is only because < formula > really means < other formula >." Yes, precisely. If notation is used wrong you get wrong answers; if you use notation correctly, you get correct answers. If you get a result that you know is false, you can be sure that the notation has been misused.

Now what people generally mean, I think, when they critique Leibniz's notation as "leading to contradictions" is that certain mis-uses of the notation are very tempting, and people are prone to making them. This may be true, although I would counter that other notations (primes, dots, what have you) also have their "attractive nuisances". But that is a psychological problem, having to do with the human tendency to look for shortcuts and to perceive seemingly apparent patterns that are not really there; the fault lies not with our d's but with ourselves.

0
On

Introducing a seemingly independent variable $t$

NotNotLogical asked in his latest edit of his answer, why it should be allowed to introduce $t$ and write $$ \frac{dy}{dx}=\frac{dy/dt}{dx/dt} $$ This addresses one major difficulty in Leibniz's original calculus, that regularity conditions (differentiability, continuity etc.) were only very vaguely thought of. To some extend simply because the notions we use to formulate such properties (functions, limits etc.) were not condensed and conceptualised at that time in history. They knew of singularities very well, though.

That said, maybe other notions could have done an equally proper job. However, the 17th century mathematicians were only concerned with curves of great regularity. I am not even certain whether they would have classified modern "monstrous examples" like Weierstrass Function as being curves at all.

One obvious way to introduce another variable without violating the ideas of the historical account of Leibniz's calculus could be to let $dt$ be the element of the curve $(x,y)$ in the way that $dt^2=dx^2+dy^2$. If $dx$ and $dy$ make sense (ie. the curve $(x,y)$ is sufficiently regular) this relation should make sense too.

What would never work is to introduce $dt$ without any relation to $dx$ and $dy$. Other relations than $dt^2=dx^2+dy^2$ could also work, but differentials have to be interdependent in order for the fraction-like manipulations with them to be well-founded and work.