During the first lecture of the MIT course (Single Variable Calculus - rate of Change, 46:00) professor David Jerison uses the binomial theorem to explain the following function:
$(x + \Delta x)^n = x^n + nx^{n-1} \Delta x + O((\Delta x)^2)$
Where $O((\Delta x)^2)$ is an approximation of all the remaining terms.
I understand that $(x + \Delta x)^2$ can be decomposed into $x^2 + 2x\Delta x + \Delta x^2$ but how can I understand it for n?
Hint: By the Binomial Theorem$$\begin{align*}(x+\Delta x)^n=&\dbinom{n}{n}x^n(\Delta x)^0+\dbinom{n}{n-1}x^{n-1}(\Delta x)^1+\dbinom{n}{n-2}x^n\cdot(\Delta x)^2+\\+&\dbinom{n}{n-3}x^n\cdot(\Delta x)^3+\ldots\end{align*}$$ Because $\Delta x$ is a small quantity then we have that $$(\Delta x)^2>>(\Delta x)^k$$ for all $k>2$. So we say that all the other terms depend mainly on $(\Delta x)^2$ since everything else (i.e $(\Delta x)^k,$ for $k>2$) is negligible compared to it.