The formal definition of the differential of a differentiable function $f: x \mapsto y=f(x)$ is that it's a two-variable function, its name is $df$ and its value is $df(x,\Delta_X) = f'(x)\cdot\Delta_X$.
It's used by Courant for instance and i read in Wikipedia ( http://en.wikipedia.org/wiki/Differential_of_a_function#CITEREFCourant1937i ) that it's the modern treatise of differentials in differential calculus .
I'm trying to see how do we go from that to $df(x) = f'(x) dx$ and then if $y=f(x)$, to the usual $dy = f '(x) dx$ that we see everywhere regarding linear approximation.
First of all, what would $dx$ mean ? Is it the differential of what function ? What about $dy$ or $df(x)$, is it the differential of what function ? What would be the values of those differentials ?
Since the formal definition of differentials treats it like a function i can't understand what these symbols "$dx$" and "$dy$" actually mean in the usual context.
Any help highly appreciated.
Formally, $x:M\to\mathbb R$ is a map from a manifold $M$ into the reals.
For one dimensional calculus, the manifold $M$ is usually taken to be $\mathbb R$ or a region thereof. $x(p)$ is a function used as a coordinate, and it tells you where on the manifold you are. Its argument is the abstract point on the manifold. Therefore, the manifold is the set of all possible points you might be sitting at. You usually think of just one point at a time.
$y=f(x)$ is also a function on the manifold, and by the chain rule $\mathrm d y|_p= f'(x(p)) \mathrm d x|_p$ at $p$, a point on the manifold.
You could also view $y$ as a local coordinate and then $x=x(y)$ locally and so on.
A vector field $X^a$ on a manifold $M$ is a map from functions $f$ to their rate of change along that vector $X(f)=X^a\partial_a f$ in any coords. In one dimension, a vector field has one component so we can write it as $X$. In fact, we can interpret their action for small values $\Delta (f) = f' \times \Delta$ as being a predictor of the results of a small change in position (the flow along the integral curves of $\Delta$.
A differential of a function $\mathrm d f$ is a map from vector fields to functions given by $\mathrm d f(\Delta) \equiv \Delta (f)$. That is,
Therefore $\mathrm d x$ just stores the information about how fast the coordinate $x$ changes. You make arguments like this: $$(\mathrm d f(x))(\Delta)(p)=\Delta(f(x(p)))= \Delta^a(p)\partial_a f(x(p)) = \Delta^a(p) \partial_a x(p)\times f'(x(p)) = f'(x(p)) \Delta (x(p)) = f'(x(p)) (\mathrm d x)(\Delta)(p)$$ and by linearity comparing the left and right we deduce $$\mathrm d f = f'(x) \mathrm d x$$
You can figure out a 'small change' interpretation of all this because the definition of a vector field is exactly what it needs to be for this to work.
Note: By $\partial_a$ I mean a derivative with respect to the $a$th coordinate which is arbitrary.