$\DeclareMathOperator{Diff}{Diff}$
Fix $\varphi_0 \in \Diff(M)$.
Claim: The "tangent space" $T_{\varphi _0}\Diff(M) = \{ X \circ \varphi_0 \mid X \in \Gamma(TM) \}$.
Justification: Pick a smooth family of diffeomorphisms $\varphi: (-\varepsilon, \varepsilon) \to \Diff(M): t \mapsto \varphi(t) = \varphi_t$. Then for each $p \in M$, there is a smooth path $\gamma_p: (-\varepsilon, \varepsilon) \to M: t \mapsto \varphi_t(p)$, so $$(\varphi'(0))(p) = \left.{d \over dt}\right|_{t=0} \varphi_t(p) = \gamma_p'(0) \in T_{\gamma_p(0)}M = T_{\varphi_0(p)}M.$$ Thus, a "tangent vector" at $\varphi_0$ is not a vector field (unless $\varphi_0 = \operatorname{id}$); it's a function $M \to TM$ sending $p$ to something in $T_{\varphi_0(p)}M$. Fortunately, such a function uniquely determines a vector field because it factors as $$M \xrightarrow{\varphi_0} M \to TM:p \mapsto \varphi_0(p) \mapsto \text{something} \in T_{\varphi_0(p)}M,$$ where the second map is an honest vector field. The claim is now justified.
(In the context of integrating time-dependent vector fields, all this is usually summarized by the equation $${d \over dt} \varphi_t = X_t \circ \varphi_t.$$
I'm aware that there's a rigorous definition of a smooth structure for the infinite-dimensional $\Diff(M)$, but I know nothing about it. I'm considering a one-parameter family of diffeomorphisms to be smooth iff the induced map $M \times (-\varepsilon, \varepsilon) \to M$ is smooth (I've been told this doesn't really coincide with the smooth structure on $\Diff(M)$, but I'm not sure how). Given this definition of a smooth path in $\Diff(M)$, we can certainly ask for its velocity. This crude analogy is all I have in mind when I say "tangent vector.")
One consequence of the claim is that there's a canonical identification of $T_{\varphi_0}\Diff(M)$ with $T_{\operatorname{id}}\Diff(M) = \Gamma(TM)$ and hence a canonical parallelization. For finite-dimensional Lie groups, we have two canonical parallelizations given by left and right translation, but neither is "better" than the other. My question is: What is going on with $\Diff(M)$? Morally speaking, why does it have a canonical parallelization? Does it coincide with the one induced by left or right translation or neither?
I suspect that this parallelization of $\Diff(M)$ is neither the one induced by left translation, nor that induced by right translation. In fact, I suspect that the moral reason for this parallelization is that $\Diff(M)$ acts on $M$; more generally, I think an action of a Lie group $G$ on an arbitrary manifold induces a parallelization of $G$ by the same principle above. However, I can't prove any of these claims, nor can I find any references. I'm sure they're abundant and I'm just not looking in the right places. Any suggestions would be appreciated.
I believe Kriegl & Michor discuss the tangent bundle of a diffeomorphism group in their text The Convenient Setting for Global Analysis.
Anyway, here's a proof using synthetic differential geometry (SDG) that the tangent space to the diffeomorphism group of a smooth space $M$ is the space of vector fields $\mathfrak{X}(M)$. There's a version of this argument in Anders Kock's text.
In SDG, we work in a smooth topos. For concreteness, let's take the Dubuc topos, into which the category of smooth manifolds embeds fully faithfully.
In our smooth topos, the real line $R$ is augmented with infinitesimal elements. In particular, the subspace $D := \{ d \in R \mid d^2 = 0 \}$ is treated as the "walking tangent vector", and we define the tangent bundle of a space $M$ as the mapping space $M^D$ of "infinitesimal curves" in $M$, with the projection given by evaluating at $0$. The tangent map $f_*: M^D \to N^D$ of any map $f: M \to N$ is just given by postcomposition with $f$.
Now, a vector field $X: M \to M^D$ is a section of the projection $\mathrm{eval}_0: M^D \to M$, and by currying the vector field is equivalent to a map $X: D \to M^M$ such that $X(0)=1_M$.
But $M^M$ is just the space of maps $M \to M$, and is a smooth monoid under composition. So a vector field is precisely a tangent vector to the mapping space $M^M$ at the identity: it's an infinitesimal transformation!
But what about if we want to consider the group of invertible maps $M \to M$? Well, we can check that in fact the tangent space is the same. Given a vector field $X: D \to M^M$, we can show that $X(-d)=X(d)^{-1}$ provided $M$ is infinitesimally linear (this is proved in Kock's book). So in fact the space of invertible maps $\mathrm{Diff}(M)$ shares the same tangent space as $M^M$.