I'm learning Riemannian geometry and in particular about connections and I have now seen multiple different definitions for this and trying to understand how they are all the same.
The first one and the one I'm relatively comfortable with is that if $\pi : E \to M$ is a smooth vector bundle over a smooth manifold $M$, then a connection is a map $$\nabla : \mathfrak{X}(M) \times \Gamma(E) \to \Gamma(E), (X,Y) \mapsto \nabla_XY$$ satisfying certain the product rule and linearity over $\mathbb{R}$ in $Y$ as well as linearity over $C^\infty(M)$ in $X$.
The second one I have is that a connection on a smooth vector bundle $\xi$ is an $\mathbb{R}$-linear map $$\nabla: \Omega^0(\xi) \to \Omega^1(M) \otimes_{\Omega^0(M)} \Omega^0(\xi)$$ which satisfies the Leibnitz rule $\nabla(f\cdot s)=df \otimes s + f \cdot \nabla s.$
The last one is from Wikipedia which states that for a smooth vector bundle $E \to M$ a connection is an $\mathbb{R}$-linear map $$\nabla:\Gamma(E) \to \Gamma(T^*M \otimes E)$$ that satisfies the same kinda properties as the two above.
Now I think that most of my confusion here is due to not understanding the tensor product properly. If anyone can provide some idea on why the two latter ones should coincide with the first one that would be much appreciated. Also for the record I'm quite well acquainted with differential forms so there is no need to explain what $\Omega^k(M)$'s are here.
Those definitions of connection are all saying the same thing: essentially, a connection is a notion of directional differentiation. in $\mathbb{R}^N$ this notion is simplified to that of ordinary differentiation along a path, with the aid of the chain rule. For more general manifolds, we don't even have a notion of addition for vectors in different tangent spaces; in particular, if we try to adapt the definition we had for euclidean vector fields, we will get nowhere. The answer? Same as always; generalize its main properties and hope for enough structure to get something interesting.
First of all, the directional derivate of a vector field $w$ along $v$, $\nabla_v w$, is linear both in $v$ and $w$, in the obvious sense; hence the linearity properties of the connection as a kind of bilinear map, mentioned in you first defition. In your second defition, these properties are guaranteed because the target space of the map is a tensor product space; its inhabitants are automatically multilinear maps. In particular, this one has the same kind of "bilinearity". Your third definition is a bit more unclear, but its the same thing: the target space comes from a tensor product space (a bilinear one), and as such the connection inherits these properties. The next main property of the directional derivative is that it respects the product rule for function-vector multiplication. This inspires the requirement that the connection respects the generalization of that, for all tensors: that is, respecting the Leibniz rule under the tensor product. This is the Leibniz rule explictly written in your second definition and mentioned in the other ones. One can also show that with these requirements are equivalent to choosing a certain family of functions $\Gamma_{a\,b}^c$ in $\mathbb{R}^N$, with certain transformation properties under transition maps. This family of functions is what is often called the connection, and your definition of connection encapsulates this as well as the notion of covariant derivative, which is that generalization of the directional derivative I have described. As for the tensor product, you can conceptualize it an analog way to the exterior product for differential forms; the product of a $l$ and $m$ rank differential form is a $(l+m)$ differential form, meaning it is $(l+m)$ multilinear in the corresponding vector space. The tensor product is more general in that instead of producing higher rank multilinear maps of vector spaces, it produces higher rank multilinear maps of vector spaces and their duals,ie the tensor product of a (p,q) type tensor and a (l,m) tensor is a (p+l,q+m) type tensor, meaning a tensor that is p+l multilinear in some vector space and q+m multilinear in its dual.