I have encountered the term differential/pushforward many times in the literature, although I cannot seem to understand just what is meant by it. I still cannot seem to understand the definition of the differential of a multivalued multivariable function
$ f : \mathbb{R}^n \to \mathbb{R}^m $
and its generalization to differentiable manifolds. I have seen many definitions of the differential, particularly those for tangent vectors on manifolds and the definition with derivations of functions and one involving the Jacobian matrix, but I cannot understand this or any of them, thus I also cannot understand just what is meant by tangent space on manifolds. What does the differential "really mean" and how to use it? In particular, how is the differential/pushforward related to derivations of functions? Could someone please explain the differential to me and possibly using it to define tangent spaces on manifolds. I am frustrated as I have never understood the differential and lack the proper understanding in tangent spaces. All help is appreciated.
Let $f\colon\mathbb{R}\to\mathbb{R}$ and $x\in\mathbb{R}$, then $f$ is differentiable at $x$ if and only if there exists $\alpha\in\mathbb{R}$ such that: $$f(x+h)=f(x)+\alpha h+o(h),$$ when $\alpha$ exists, it is unique, denoted by $f'(x)$ and called the derivative of $f$ at $x$.
Proof. Assume that there exists $\beta\in\mathbb{R}$ such that $f(x+h)=f(x)+\beta h+o(y-x)$, then: $$(\alpha-\beta)(y-x)=o(y-x),$$ whence $\alpha=\beta$. $\Box$
Remark. Notice that $h\mapsto\alpha h$ is a linear map.
Geometrically, $y=f(x)+f'(x)(y-x)$ is the best line approximation of the graph of $f$ around $x$.
Now, let $f\colon\mathbb{R}^m\to\mathbb{R}^n$ and $x\in\mathbb{R}^m$, generalizing the above definition, $f$ is differentiable at $x$ if and only if there exists a linear map $\ell\colon\mathbb{R}^m\to\mathbb{R}^n$ such that: $$f(x+h)=f(x)+\ell(h)+o(h),$$ when $\ell$ exists, it is unique, denoted by $T_xf$ and called the differential of $f$ at $x$.
Proof. Assume that there exists a linear map $\ell'\colon\mathbb{R}^m\to\mathbb{R}^n$ such that $f(x+h)=f(x)+\ell'(h)+o(h)$, then: $$(\ell-\ell')(h)=o(h).$$ Let $h\in\mathbb{R}^n\setminus\{0\}$ and $t\in\mathbb{R}^*$, then $\displaystyle\frac{(\ell-\ell')(th)}{\|th\|}=\frac{(\ell-\ell')(h)}{\|h\|}$ converges toward $0$ when $t$ goes to $0$, therefore: $$\ell(h)=\ell'(h),$$ and this also holds for $h=0$. $\Box$
Remark. This definition can be extended to maps defined only on an open set of $\mathbb{R}^m$, as $x+h$ would fall in this open set for $h$ being sufficiently small.
I won't explain how to define the tangent space of a manifold and the tangent map of a differentiable function as it is tedious and quite long. However, the main idea is that a manifold is locally an open set of some $\mathbb{R}^m$.
I recommend you have a look at An introduction to Differential Manifolds by J. Lafontaine.