I was exactly having the same doubt as this question. I don't understand specifically why
$$ (d \exp_p)_v(v)=v $$
I worked out exactly the same math as wikipedia and I ended up with
$$ (d \exp_p)_v(v) = \frac{d}{dt}\left. \left(\gamma((t+1),p,v) \right) \right|_{t=0} $$
The equation is based on the use of the curve $\alpha(t) = (t+1)v$ where $v \in T_p M$. Apparently the key in understanding how to fill the gap is to use somehow the parallel transport but I couldn't figure from the given answer actually.
The specific bit I can't figure is that apparently, from one of the comments, it might be the case that the result of $(d \exp_p)_v(v)$ is actually the parallel transport of $v$ along the geodesic passing through $\exp_p(v)$.
Can anyone clarify?
It might be easier than you think if you stick to do Carmo's text. The key claim is that $$ \langle d(\text{exp}_p)_v(v), d(\text{exp}_p)_v(w_T) \rangle = \langle v,w_T \rangle, $$ for any $w_T = av \in T_pM$. By the fact that $\gamma(t) = \text{exp}_p(tv)$, we can compute $d(\text{exp}_p)_v(v)$ by taking $v$ as the initial velocity of the curve $\alpha(t) = v+vt$ starting at $v$. We will obtain $d(\text{exp}_p)_v(v) = \gamma'(1)$. So we have \begin{align} \langle d(\text{exp}_p)_v(v), d(\text{exp}_p)_v(w_T) \rangle &= a \, \langle d(\text{exp}_p)_v(v), d(\text{exp}_p)_v(v) \rangle \\ &=a \, \langle \gamma'(1),\gamma'(1) \rangle \\ &= a\, \langle \gamma'(0),\gamma'(0) \rangle\\ &= a\, \langle v,v \rangle\\ &= \langle v,w_T \rangle, \end{align} where the third equality holds because $\langle \gamma'(t),\gamma'(t) \rangle$ is constant along geodesic $\gamma$.