I am trying to understand the following definiton.
$f:\mathbb{R}^n \rightarrow \mathbb{R}^m$ . The total derivative of $f$ in point $a$ is the unique linear map $Df|_a$ such that $$\lim_{h \rightarrow 0}\dfrac{f(a+h)-f(a)-Df|_a(h)}{||h||} = 0$$
Could someone explain why this definition works?
-Why should we divide by $||h||$?
-Why is $Df|_a$ linear?
-How should I interpret this linear map $Df|_a$, what is the meaning of the total derivative?
This is one of the most fundamental definitions in all of analysis.
It says that the increment $\Delta f:=f(a+h)-f(a)$ of the function value should in first approximation be a linear function of the increment $h$ attached at the point $a$. In other terms: We want $$f(a+h)-f(a)=Lh +r(h)\qquad(|h|\ll1)\ ,\tag{1}$$ whereby the error $r(h)$ should be smaller by magnitudes than the linear term $Lh$ when $h$ is small. Now in general $|Lh|$ will be of order $|h|$ for "most" $h$. This means that we should require that $$\lim_{h\to0}{|r(h)|\over |h|}=0$$ in order to impart any real content to $(1)$. It turns out that this condition determines $L$ uniquely. If it can be satisfied then $f$ is called differentiable at $a$, and one denotes the resulting $L$ by $Df\bigr|_a$, or similar.