I first posted this over at HSM, without much uptake.
I'm trying to understand the development of the calculus. Does this sound plausible as one of the stages?
Newton knows the binomial theorem, which gives
$$(x+y)^n={n\choose0}x^ny^0+...\;\;\;\;\;\;\;\;\;\;(1)$$
Letting $y=δ_x$ (I realise this is Leizbniz's notation but I find it easier to follow than Newton's $o$), we get
$$(x+δ_x)^n={n\choose0}x^nδ_x^0+...\;\;\;\;\;\;\;\;\;\;(2)$$
Considering $δ_x$ as the base of a differential triangle under a curve, the vertical of the triangle is given by $(x+δ_x)^n-x^n$, which gives us
$$(x+δ_x)^n-x^n={n\choose0}x^nδ_x^0+...-x^n\;\;\;\;\;\;\;\;\;\;(3)$$
But ${n\choose0}x^nδ_x^0=x^n$, so the first part of the expansion disappears and everything else moves up one place to the left and we get
$$(x+δ_x)^n-x^n={n\choose1}x^{n-1}δ_x^1+...\;\;\;\;\;\;\;\;\;\;(4)$$
Now, the vertical $(x+δ_x)^n-x^n$ can be called $δ_y$, so now we can write
$$\frac{δ_y}{δ_x}=\frac{(x+δ_x)^n-x^n}{δ_x}\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\\ = \frac{1}{δ_x}{n\choose1}x^{n-1}δ_x^1+...\;\;\;\;\;\;\;\;\;\;(5)$$
As $δ_x$ gets small, the gradient approaches instantaneity, which is good, but $\frac{1}{δ_x}$ gets large, which seems problematic. Fortunately, it cancels with the $δ_x$ terms in the expansion to give
$$\frac{δ_y}{δ_x}={n\choose1}x^{n-1}+{n\choose2}x^{n-2}δ_x^1+...\;\;\;\;\;\;\;\;\;\;(6)$$
And now we can use the biniomial theorem to make a differential triangle under any polynomial.
Sound right?
This is more or less how the proof of the power rule goes. The binomial theorem is handy to generalise the idea to arbitrary integral indices, but the proof of a specific case is much more clear to follow at first: \begin{align*} y + \delta y &= (x+\delta x)^3\\ &= x^3 + 3x^2\delta x+3x\delta x^2 + \delta x^3\\ \implies \delta y &= \delta x(3x^2+3x\delta x + \delta x^2)\\ \implies \frac{\delta y}{\delta x}&=3x^2+3x\delta x + \delta x^2 \approx 3x^2. \end{align*}