The origin of this problem stems from extending the covariant derivative to tensors. Which we (but not me) know that for a $(r,s)$ tensor $F$
$\nabla F(u_1,\dots,u_r,v_1,\dots, v_s) = (\nabla_XF)(u_1,\dots,u_r,v_1,\dots, v_s) - \sum F(u_1,\dots,\nabla_X u_i, \dots, u_r,v_1,\dots, v_s) - \sum F(u_1,\dots, u_r,v_1,\dots, \nabla v_i,\dots v_s).$
Basically there is some derivation/Liebeniz property going on ($\nabla_X (T \otimes S ) = \nabla_X T \otimes S + T \otimes \nabla_X S)$ and it is passing it via the composition. In Lee's book (page 53), he says this formula follows from all the properties in his Lemma 4.6. In Da Carmo's book, this formula is given as a definition. In another pdf I am reading, it says the yellow property can be derived by iterating ($\nabla_X (T \otimes S ) = \nabla_X T \otimes S + T \otimes \nabla_X S)$)
All the arguments (in some physics pdfs) I have seen fiddles with the indices, but to me when we even write a vector field $X = X^i \partial_i$, it is just notation that we denote the smooth functions $X^i$ by raising index. And the math text I have on hand usually leave this as an exercise (completely unrelated, but if you got another Rieman Geometry book that isn't by Gallot or Lee, I like to know)
So I think the problem boils down to
$$\nabla_{\partial_m} dx^i.$$ Now from some physics reference, this boils down to evaluating $dx^i$ at $(\partial_j)$. This allows us to apply "Liebniz's rule" to $\nabla_{\partial_m} dx^i(\partial_j) = (\nabla_{\partial_m} dx^i)(\partial_j) + dx^i(\nabla_{\partial_m} \partial_j)= (\nabla_{\partial_m} dx^i)(\partial_j)+ \Gamma_{mj}^w \delta_w^i $
But this only should work if we can write $dx^i(\partial_j) = dx^i \otimes \partial j$, but they can't be equal.