People often say that the derivative of a function at a point is its "best" linear (or affine) approximation around that point. This seems like a good intuition, but I've never seen it made precise. What I'd hope for is a sort of universal property, given a class of possible approximations, that specifies which approximation is the best one, if it exists.
I'd want to get continuity from the class of constant functions, derivatives from the class of affine functions, and in general the kth-order Taylor polynomial from the class of polynomials of degree up to k.
One possible definition of one approximation being better than another might be that its error is smaller at every point in some neighborhood. That is: Given some class $G$ of functions $: X \to Y$, $f : X \to Y$, and $x \in X$, The best $G$-approximation of $f$ at $x$ is a function $g_0 \in G$ such that for all $g \in G$, there's a neighborhood $A$ of $x$ such that for all $x' \in A$, $d(g_0(x'), f(x')) \le d(g(x'), f(x'))$.
(In particular, if we say that a function is continuous if it can be approximated by a constant function, we get this somewhat unusual definition of continuity: $f : X \to Y$ is continuous at $x \in X$ if for all $y \in Y$, there's a neighborhood $A$ of $x$ such that for all $x' \in A$, $d(f(x'), f(x)) \le d(f(x'), y)$.)
Does this definition do what I want? Can it be generalized to non-metric spaces? Or is there another definition of "best approximation" that works better?