Many text books on differential geometry motivate covariant derivative more or less by saying that if you have a vector field along a curve on a manifold (that is a curve $\gamma(t)$ and an assignment of a vector $X(\gamma(t))$ at each point) then you can not directly define its derivative because you can not subtract two vectors living at different spaces. Lie Derivative here does not also help since you would need to extend $\dot{\gamma(t)}$ to a vector field to define the Lie derivative along that vector field and then the Lie derivative will depend on the extension.
So ok covariant derivative $\nabla_{\gamma(t)}X$ gives you a way to differentiate vector fields along curves by letting you compare two different tangent spaces through parallel transport along $\gamma(t)$. But what I dont understand is what is the problem with constructing the curve $t \rightarrow (\gamma(t),X(\gamma(t)))$ which will be a curve inside the manifold TM and then derivative it whose coordinate expression would be $(\gamma(t),X(\gamma(t)),X(\gamma(t)),\beta(t))$ and call $\beta(t)$ the derivative of $X$ along $\gamma(t)$. This derivative does not live on $TM$ but lives on $TTM$ that is true, but what is the problem with this?
This also makes me think whether if one can define a connection on $M$ by defining some kind of projection $\pi: TTM \rightarrow TM$ so that first you find $\beta$ as above and then somehow send it back to $TM$. In fact this is the way how you turn a second order ODE on $M$ to a first order ODE on $TM$. Is there are more deeper way of understanding the necessity for covariant derivative?
Your suggestion is a sound one, and in fact the problem is not the identification of $TTM$ and $TM$, since it is locally trivial. That could be overcome, perhaps. The problem is that the lift you want is called a horizontal lift of your vector field, and that is precisely what you don't canonically have. A connection (which is the "mother" of the covariant derivative) does precisely that: it selects a horizontal space on each tangent space to the tangent bundle. The covariant derivative then works precisely the way you suggested, except it depends on that initial choice of connection.
For reference, see: http://en.wikipedia.org/wiki/Ehresmann_connection