Suppose $E$ is a $q$-dimensional real vector bundle on a smooth manifold $M$ and $\Gamma(E)$ is the set of smooth sections of $E$ on $M$. A connection on the vector bundle $E$ is a map $$ D:\Gamma(E) \to\Gamma(T^*(M)\otimes E)\tag{1} $$ which satisfies the following conditions:
- For any $s_1,s_2\in\Gamma(E)$, $D(s_1+s_2)=Ds_1+Ds_2$.
- For $s\in\Gamma(E)$ and any $\alpha\in C^\infty(M)$, $$ D(\alpha s) = d\alpha\otimes s + \alpha Ds\;. $$
Suppose $X$ is a smooth tangent vector field on $M$ and $s\in\Gamma(E)$. Let $$ D_Xs:=\langle X, Ds\rangle\;\tag{2} $$ where $\langle\;,\rangle$ represents the pairing between $T(M)$ and $T^*(M)$. Then $D_Xs$ is a section of $E$, which is called the covariant derivative of the section $s$ along $X$. This definition is given in Chern's Lectures on Differential Geometry.
By (1), $Ds$ is an element in $\Gamma(T^*(M)\otimes E)$, not $\Gamma(T^*(M))$. On the other hand, $X\in\Gamma(T(M))$. How should I understand the pairing in (2)?
In John Lee's Riemannian Manifolds, a connection in $E$ is a map $$ \nabla : T(M)\times \Gamma(E)\to \Gamma(E)\tag{3} $$ written $(X,Y)\mapsto \nabla_XY$, satisfying
- $C^\infty(M)$-linear in the first component;
- $\mathbb{R}$-linear in the second component;
- the product rule $$ \nabla_X(fY) = f\nabla_XY+(Xf)Y\;. $$
Essentially $\nabla_XY=D_XY$ in Chern's notation; we can show that (2) satisfies all the defining properties for (3).
Are there some reasons we would like to go to the more abstract definition in (1) instead of (3)?
The pairing $TM \times (T^* M \otimes E) \to E$ is really just the canonical pairing $\operatorname{tr}: TM \times T^* M \to \Bbb R$ with the tensorial factor $E$ coming along (inertly) for the ride: More precisely, by definition $$\langle \,\cdot\, , \,\cdot\, \rangle$$ is the composition $$TM \times (T^* M \otimes E) \stackrel{\otimes}{\longrightarrow} TM \otimes T^* M \otimes E \stackrel{\operatorname{tr} \otimes \operatorname{id}_E}{\longrightarrow} E .$$ On decomposable elements, $$\langle X, \alpha \otimes \xi \rangle = \alpha(X) \xi .$$
As for comparing the definitions, only a little unwinding is required to show that the two are coincident; I cannot improve on Ted Shifrin's comment about Chern's form approach to geometry.