I've read the proof of it, but it's still not intuitive for me, so I have to memorize it.
By now, the only one reasonable way to think about it is: Since $S$ inherits both operations of $V$, and those operations are closed under $V$ and possess those eight properties, so the proof can be done.
But I'm not sure that whether this is the essential part that $span(S)$ must be a subspace of $V$.
The definition of span in my book is:
Let $S$ be a nonempty subset of a vector space $V$. The span of $S$, denoted span($S$), is the set consisting of all linear combinations of the vectors in $S$. For convenience, we define span($\emptyset$)$=\{0\}$.
The definition of $\operatorname{span}$ is chosen so that the following property holds:
That's really the fundamental aspect of what a span is—if it weren't true for whatever reason, we'd fix the definition of $\operatorname{span}$ to make it work. Since you already understand the technical aspects, I think that's the best answer for how to think of it: if you have a subset but want a subspace instead, $\operatorname{span}$ is the technical gizmo that will do that for you.