I am looking at this proof: https://math.berkeley.edu/~nikhil/courses/121a/chain.pdf and have some confusion. I have seen several proof using little-o notation and am quite confused.
Can someone explain the approximation of a differentiable function $g(x+\Delta x)=g(x)+\Delta x\cdot g'(x) +o(\Delta x)$. Why is this an equality? I think that as $\Delta x \to 0$ this becomes an equality by definition of the derivative, so why isn't there a limit?
I really don't understand the composition step. For example, why can we write $\Delta y = g'(x)\Delta x + o(\Delta x)$. I have seen people do this step toher ways such as in this post Proof of multivariable chain rule (using $h,k$). If someone could explain this step that would very helpful.
I have trouble extending the linear approximation in (1) in multiple dimensions. If $f:\mathbb R^m \to \mathbb R^n$, how can I approximate $f$ using the Jacobian?
Any help is appreciated!
The equality is equivalent to showing $g(x + \Delta x) - g(x) - \Delta x \cdot g’(x)$ is $o(\Delta x)$. Viewing the left-hand side as a function of $\Delta x$, we see that $$\lim_{\Delta x \to 0} \frac{g(x + \Delta x) - g(x) - \Delta x \cdot g’(x)}{\Delta x}= 0$$ by the definition of the derivative. Using the definition of little-o notation given in the second paragraph, this proves the (equivalent) equality.
Since $y = g(x)$, we have $\Delta y = g(x + \Delta x) - g(x)$. We can now use the equality you asked about in (1) to see that $\Delta y = g’(x) \Delta x + o(\Delta x)$. I’m not sure what other questions you have about this step, but hopefully this will get you started.
The basic idea is to “vectorize” everything. We essentially define the Jacobian $Df$ to satisfy the equation
$$g(\mathbf{x} + \Delta \mathbf{x}) = g(\mathbf{x}) + Dg \cdot \Delta \mathbf{x} + o(\| \Delta \mathbf{x}\|), $$ where now $\mathbf{x}$ and $\Delta \mathbf{x}$ are vectors in $\Bbb{R}^m$. When you write out the entries $Dg$, which are various partial derivatives, and do out the matrix multiplication, in each coordinate you’ll get the 1-dimensional chain rules as worked out above and page two of the notes.