So I am just starting to learn calculus using Thompson's Calculus Made Easy (http://calculusmadeeasy.org/). (I already have some background in limits from pre-calculus, but I thought it would be cool to learn a more informal infinitesimal approach closer to how Leibniz and Newton originally did things.)
Anyway, I was thinking about the differentiation of $y=x^2$ (http://calculusmadeeasy.org/4.html).
Specifically, given $y=x^2$, is it also true that $dy=(dx)^2$?* Why or why not? I feel like it should be true. But the problem is that when I manipule $dy=(dx)^2$ I get $dy/dx=dx$. That makes no sense, and contradicts what I understand to be the actual expression for $dy/dx$; namely $2x$, the derivative of $y=x^2$. Surely $dx$ is not equal to $2x$!
Please help! :)
*To be clear, I am aware that this is not the equation you'd use to calculate the derivative (that equation would involve $y+dy$ and $x+dx$). The above equation was just something I became conceptually confused about.
In the relation $y = x^2$, suppose we increase the value of $x$ by some incremental amount $\Delta x$. Then how much does $y$ change? That is to say, what is $\Delta y$, if $$y + \Delta y = (x + \Delta x)^2?$$ Well, we expand the RHS to get $$x^2 + 2x \Delta x + (\Delta x)^2.$$ But since $y = x^2$, we can subtract this from both sides to get $$\Delta y = 2x \Delta x + (\Delta x)^2.$$ So when $\Delta x$ is very small, the then $(\Delta x)^2$ is much smaller than $\Delta x$ (for example, $\Delta x = 0.001$ implies $(\Delta x)^2 = 0.000001$).
What this tells us is that, to a good approximation, $$\Delta y \approx 2x \Delta x.$$ And this means that the incremental change in $y$ when $x$ is changed by some incremental amount $\Delta x$, is roughly $2x \Delta x$. And as $\Delta x$ tends to $0$, this approximation becomes better and better. This is the idea captured by the differential expression $$dy = 2x \, dx.$$ We can formalize this notion using the definition for a derivative, but it is worth noting here that $(dx)^2$ is effectively $0$--it doesn't make sense to say $dy = (dx)^2$ because the differential $dy$ should depend on the magnitude of $x$ itself, and not just on the differential $dx$ (that's why there's a factor of $2x$).
To frame it another way, using more concrete numbers, consider the case $(x,y) = (0.1, 0.01)$. Now consider a very small $\Delta x = 0.0001$. This would make $(x + \Delta x, y + \Delta y) = (0.1001, 0.01002001)$, so $\Delta y = 0.00002001$, which is remarkably close to $2x \Delta x = 2(0.1)(0.0001) = 0.00002$.
But in another case, say $(x,y) = (4,16)$, with the same $\Delta$, we have $$(x + \Delta x, y + \Delta y) = (4.0001, 16.00080001),$$ and $\Delta y = 0.00080001$. And we have $2x \Delta x = 2(4)(0.0001) = 0.0008$. But since $\Delta x$ is the same in both cases, so must $(\Delta x)^2$, yet $\Delta y$ is clearly not the same--it is proportional to the choice of $x$.