The following notation is from Petersen book (p. 57 3rd Edition but not in previous edition):
it seems reasonable to define $\nabla S$ as $$\nabla S(X,Y)=(\nabla_X S)Y=\nabla_X(S(Y))-S\nabla_XY.$$ In other words $\nabla_XS=[\nabla_X,S]$.
But if $[\nabla_X,S]$ act on vector fields then $[\nabla_X,S]Y=(\nabla_X S-S\nabla_X)Y=(\nabla_X S)Y-S(\nabla_XY)$ that is not $(\nabla_X S)Y$. But if we evaluate $\nabla_X$ and $S$ at $Y$ first then compute its Lie bracket then that works! But this make no sense to me!! Why we should act on $Y$ first then apply the Lie bracket?
Here $[\nabla_X, S]$ denotes the commutator $[\nabla_X, S] = \nabla_X\circ S - S\circ\nabla_X$ so
$$[\nabla_X, S](Y) = (\nabla_X\circ S)(Y) - (S\circ\nabla_X)(Y) = \nabla_X(S(Y)) - S(\nabla_XY).$$
What you've done is inconsistent. When you write $\nabla_XS$, you interpret that as $\nabla_X(S)$ (as opposed to $\nabla_X\circ S$), but when you write $S\nabla_X$ you interpret this as $S\circ\nabla_X$.
The Lie bracket of two vector fields $[X, Y]$ can also be viewed as a commutator when one regards vector fields as acting on smooth functions.