It is straightforward for me to work with $\nabla_i$: like taking covariant derivative, contracting, etc. But when I want to do the same with $\nabla^i$ it is strange for me to work with, and it is more stranger when I want to contract it, e.g. $\nabla^i f\nabla_if$. This doesn't mean that I can't calculate it.
How to work with $\nabla^i$ easily like $\nabla_i$?
The most misleading point for me is: $G^{ij}G_{jk}=G^i_k$ and $G^iG_i=tr G$ but what about $\nabla^i\nabla_i f$ or $\nabla^i\nabla_i X$ and $\nabla^i f\nabla_i f$ or $\nabla^i X\nabla_i X$?
The confusion might comes from the following claim
$$ G^{ij} G_{jk} = {G^i}_k, \ \ \ G^i G_i = \operatorname {tr}G, $$
since both are false (or at least I have never seen anyone used this).
Let $g = (g_{ij})$ be the metric tensor. Then for any two tensor $G_{ij}$, ${G^i}_j$ is by definition
$$ {G^i}_j = g^{ik}G_{kj},$$
which is completely different from $G^{ik} G_{kj}$, which $G^{ij}$ by definition is
$$ G^{ij} = g^{ik} g^{jl} G_{kl}.$$
On the other hand, if you have a tensor $G = G_i$, then one cannot take trace of $G$. In general, if $A = {A^i}_{j}$ is a $(1, 1)$-tensor, then $\operatorname {tr} A$ is a scalar defined by $$\operatorname{tr}A = {A^i}_i.$$
In general when $p, q \ge 1$, one can define $\operatorname{tr}A$ of a $(p,q)$-tensor $A$ by summing up one upper and one lower indices (depending on which indices to choose, there are many different choices of $\operatorname{tr} A$, one has to by specific). In particular, there isn't a definition of $\operatorname{tr}G$ when $G$ is a $(0,1)$- or $(1,0)$-tensor.
You can understand $G^i G_i$ as (1) First define $G^i$ by $G^i = g^{ij} G_j$, (2) take tensor product B of $(G_i)$ with $(G^j)$, which is a $(1, 1)$-tensor ${B^j}_i = G^jG_i$, and (3) take trace of this $(1, 1)$ tensor: $\operatorname {tr} B = {B^i}_i = G^iG_i$.
Going back to $\nabla^i$, $\nabla_i$: my suggestion is that you treat as if they are just another indices in your tensor: if $A = ({A^{i_1\cdots i_p}}_{j_1\cdots j_q})$ is a $(p,q)$ tensor then $\nabla A$ is a $(p, q+1)$-tensor represented by
$$ \nabla A = (\nabla_i {A^{i_1\cdots i_p}}_{j_1\cdots j_q})$$
So the $i$ in $\nabla_i$ is nothing but a lower index in your new tensor $\nabla A$. Conceptually, raising this index to $\nabla^i A$ is of no difference to raising other $j_k$ for $k=1, \cdots, q$.
To clarify $\nabla^i \nabla_i f$, $\nabla^i f\nabla_i f$ and so on..... First, given a function $f$, one can form the $(0,1)$-tensor $\nabla f = (\nabla_i f)$ and the $(0,2)$-tensor $\nabla \nabla f = (\nabla_j\nabla _i f)$. Then one raise one of the indices of $\nabla \nabla f$ to form a $(1, 1)$ tensor $$ \nabla^j \nabla_i f := g^{jk}\nabla _k \nabla _i f$$ and we can take trace of this $(1, 1)$-tensor to obtain a scalar $$\operatorname{tr}(\nabla^j \nabla_i f) = \nabla^i \nabla_i f. $$
On the other hand, $\nabla^i f \nabla_i f$ is quite different: first we have the $(0,1)$-tensor $\nabla f$, then we also obtain $\nabla ^j f$ by raising the index: $\nabla^j = g^{jk} \nabla_k f$. Next we take tensor product to form a $(1,1)$-form $\nabla^j f \nabla_i f$ and $$\nabla^i f \nabla_i f = \operatorname{tr} (\nabla^j f\nabla_i f)$$ is the trace of the tensor product.
Similar for $\nabla ^i \nabla_i X$: first we have a $(p, q)$ tensor $X$. Then $\nabla \nabla X$ is a $(p, q+2)$-tensor. Next we raise one of the indice to form a $(p+1, q+1)$-tensor: $\nabla^j \nabla_i X$ (this is an abuse of notation: to be precise we should write $X = {X^{i_1\cdots i_p}}_{j_1\cdots j_q}$ and $$\tag{1} \nabla^j \nabla_i {X^{i_1\cdots i_p}}_{j_1\cdots j_q}$$ to represent that $(p+1, q+1)$-tensor). Then $$\nabla^i \nabla_i X$$ is the trace (taking those two upper and lower indice) of the $(p+1, q+1)$-tensor in (1) (so again this is an abuse of notation: we should really write $$\nabla^i \nabla_i {X^{i_1\cdots i_p}}_{j_1\cdots j_q}.$$ It is also common to use $\nabla^*\nabla X$ to represent the above $(p, q)$-tensor. This is called the rough Laplacian).