On Wikipedia, it says
When $f$ is a function from an open subset of $\mathbb{R}^n$ to $\mathbb{R}^m$, then the directional derivative of $f$ in a chosen direction is the best linear approximation to f at that point and in that direction.
I just want to check that linear functions from $\mathbb{R}^n$ to $\mathbb{R}^m$, are defined as functions of the form $f(x) = ax+b$ where $a$ is a scalar and $b$ is a vector?
Also, it seems like functions of the form above just enlarge/shrink and shift. Is this correct? I thought that if anything was going to be a counterexample, it was going to be an off center circle; under the transformation x $\mapsto$ 2x, I thought an off-center circle might map to an ellipse; but this doesn't seem to be the case. For example, if $(x, y)$ satisfies $(x-2)^2 + (y-2)^2 = 1$, then multiplying both sides by $2^2$ gives $(2x-4)^2 + (2y-4)^2 = 4$; so $(2x, 2y)$ satisfies $(X^2-4)^2 + (Y-4)^2 = 4$, which is still a circle with center at $(4, 4)$, as expected.
In general, the derivative is the best local linear approximation to a function at a point. A differentiable function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ at $x=x_0$ is locally approximated by a vector space homomorphism $Df_{x_0} \in {\cal L}(\mathbb{R}^n, \mathbb{R}^m)$, and it is in this sense that you must understand "linear".
In the direction $v \in \mathbb{R}^n$, the directional derivative is simply $Df_{x_0}(v)$ because the derivative contains all information about all local rates of change in all directions.
Basically what happens is that you attach a copy of $\mathbb{R}^{m+n}$ to $x_0$, and you approximate the curvy graph of $f$ by the flat (linear) graph of $Df(x_0)$. This is called the tangent space to the graph of $f$ at $x=x_0$. If you balance a piece of cardboard on a beach ball, you have a good model for this. The origin is where the cardboard touches the ball, which is why you don't get an additive constant.
If you draw a line on your piece of cardboard through the point where it touches, you get a model for the directional derivative in the direction of your point. Rotate your cardboard tangent plane around that point, and you get different directional derivatives.