I was doing exercise 8 from do Carmo's Riemannian geometry and I stumbled upon the definition of gradient given.
Let $M$ be a Riemannian manifold... $f \in \mathcal{D}(M)$ .. the gradient of $f$ as a vector field $\text{grad} \; f$ on $M$ defined by $$ \langle \text{grad} \; f, v \rangle = df_p(v) \;\; p \in M, v \in T_pM \;\;\;\;\; (1) $$
here $\langle \cdot , \cdot\rangle$ is the Riemannian metric on $M$ and $f$ is a differentiable function on $M$. No the Riemannian metric is a bilinear map $$\langle \cdot,\cdot \rangle : T_p M \times T_p M \to \mathbb{R}$$ but the differential $df_p$ is a map between tangent spaces, namely $$ df_p : T_p M \to T_{f(p)} \mathbb{R} \cong \mathbb{R} $$
So in a nutshell I'm confused about the equality in $(1)$ because the lhs is a scalar in the field while the rhs is vector, though isomorphic to the scalar field. This definition actually makes a bit tricky for me to understand how to do the exercises, because any of the computations I do give me equalities that don't really make sense.
Can you clarify how the gradient is actually defined? I also own Tu's Differential Geometry, but I don't see these definitions (I'm kind of reading the two in parallel).
It's natural to have some confusion about these things. There are many similar things that come up in differential geometry and smooth manifold theory (and even much of other parts of math) where we take shortcuts or "make identifications" that make our lives easier once we understand their meaning, but can make the uninitiated's life needlessly difficult when it comes time to write proofs and ask if we really understand the shortcuts we take.
For any smooth map $f\colon M\to \mathbb R$ there is the global differential map, $df\colon TM\to T\mathbb R$ defined by $$ df(p,v) = (f(p),df_p(v)), $$ and the vector $df_p(v)$ acts on smooth functions $h$ on $\mathbb R$ by $df_p(v)(h) = v(h\circ f)$. For fixed $p\in M$, the map $df_p\colon T_pM\to T_{f(p)}\mathbb R$ is the differential of $\pmb f$ at $\pmb p$. For any point $q\in\mathbb R$, there is a canonical vector space isomorphism $L_q\colon \mathbb R\cong T_{q}\mathbb R$ defined by $$ L_q(v) = v\frac{d}{dt}\bigg|_q, $$ i.e., sending the number $v$ to the directional derivative with respect to the "vector" $v$ (which is of course merely multiplication of the number $v$ with the usual derivative operator for smooth functions on $\mathbb R$.) We can compose $L_{f(p)}$ with $df_p$ to get a linear map $$ \widetilde{df_p} \equiv L_{f(p)}\circ df_p\colon T_pM\to \mathbb R. $$ Local coordinates $(x^1,\dots,x^n)$ near $p$, give a basis $\partial_{x^1}|_p,\dots,\partial_{x^n}|_p$ for $T_pM$, with respect to which, the linear map $\widetilde{df_p}$ is simply the row vector $$ \begin{bmatrix} \displaystyle\frac{\partial f}{\partial x^1}(p) & \dotsb & \displaystyle\frac{\partial f}{\partial x^n}(p) \end{bmatrix}. $$ For $f\colon M\to\mathbb R$, we also have a well-defined covector field $df\colon M\to T^*M$. In local coordinates $(x^1,\dots,x^n)$ near $p$, we can express the covector field $df$ in terms of the local coframe $dx^1,\dots,dx^n$ (dual frame of $\partial_{x^1},\dots,\partial_{x^n}$) as $$ df = \sum_i\frac{\partial f}{\partial x^i}\,dx^i. $$ At each point $p$, we thus have a covector $df_p\colon T_pM\to \mathbb R$ expressed in terms of the basis $dx^1|_p,\dots,dx^n|_p$ by $$ df_p = \frac{\partial f}{\partial x^i}(p)\,dx^i|_p. $$ so with respect to the basis $dx^1|_p,\dots,dx^n|_p$, $df_p\in T_p^*M$ can be expressed as the row vector $$ \begin{bmatrix} \displaystyle\frac{\partial f}{\partial x^1}(p) & \dotsb & \displaystyle\frac{\partial f}{\partial x^n}(p) \end{bmatrix}. $$ So really, $df_p$ the differential and $df_p$ the covector are literally the same object up to the canonical isomorphism $L_{f(p)}$. I think that we remind ourselves of this isomorphism $L$ maybe the first few times we identify the differential $df_p$ and the covector $df_p$, but we will drop it entirely after we get used to it. With more experience, one comes to appreciate the "intent of the law" rather than strictly follow the "letter of the law," and the interpretations we make are ultimately dictated by the purposes we have in mind.
That said, if one wants to define $\mathrm{grad}f$ "right," without making identifications, then I'd say you need to be comfortable with covector fields, and the musical isomorphism $(\cdot)^\sharp\colon T^*M\cong TM$ that the metric $g$ gives us, so we can do things properly and say simply and without ambiguity that $\mathrm{grad} f = (df)^\sharp$.