Suppose I have a function $f : \mathbb{R}^m \to \mathbb{R}^n$, then $f$ is differentiable at $a$ if there exists a $n \times m$ matrix $B_a$ such that $$\lim_{h \to 0} \frac{f(a+h)-f(a)-B_a\cdot h}{|h|} = 0$$
Now we can identify $B_a$ with a linear transformation as follows. Define $T : \mathbb{R}^m \to \mathbb{R}^n$ by $$T(x) = B_a\cdot x$$ then we can also call $T$ the derivative of $f$ at $a$.
If $f$ is differentiable for each $x \in \mathbb{R}^m$, then the derivative of $f$ is the function $Df : \mathbb{R}^m \to \mathbb{R}^n$ defined by $$Df(a) = B_a \cdot a$$
(Take note that I haven't actually seen a definition of the derivative as a function in the way I said in the last paragraph, so I may possibly be wrong, if so please correct me)
Now the way I understand it is that although the derivative of $f$ at a point $a$ (which is $T$ above) is a linear transformation, the derivative of $f$, $Df$, need not be a linear transformation.
Because if the derivative $Dg$ of any arbitrary function $g$ was a linear transformation then $Dg$ would be continuous and since the $D(Dg) = Dg$ since $Dg$ is a linear transformation we would have that $g \in C^{\infty}(\mathbb{R}^m)$ so that any arbitrary differentiable function $g$ would be smooth which is clearly a contradiction since we can find counterexamples.
Am I correct in what I said above?
Definition. A function $f:\mathbb R^m\to\mathbb R^n$ is differentiable at $a\in\mathbb R^m$ if there exists a linear transformation $T:\mathbb R^m\to\mathbb R^n$ such that $$ \lim_{h\to0} \frac{f(a+h)-f(a)-T(h)}{|h|} = 0. $$ If it exists, this transformation is unique and is denoted by $Df(a)$.
Therefore, at each point of differentiability $a\in\mathbb R^m$, the derivative $Df(a)$ is by definition a linear transformation $\mathbb R^m\to\mathbb R^n$, which can of course be represented with respect to the canonical bases by the Jacobian matrix $J_f(a)$.
What you seem to be saying is that the maps $a\mapsto Df(a)$ or $a\mapsto J_f(a)$ are linear. This is of course false in general (it is true iff $f$ is quadratic).
You also seem to define the derivative as the map $a\mapsto Df(a)(a)$. This also is incorrect. The definition is the one given above.