I already know the proof for another formula that is really similiar to this one:
$$S(n,k)=\sum_{i=1}^{n}S(n-i, k-1)k^{i-1}$$
Now It is obvious that $S(n-i, k-2)$ counts partitions of first $n-i$ on $k-2$ subsets. Ok, but why is $k^{i-1} - (k-1)^{i-1}$ counting distribution of elements which have left?
Let $P$ be a partition of $[n]$ into $k$ parts. Let $m$ be maximal such that $P$ partitions $[m]$ into $k-2$ parts, and let $i=n-m$; there are $S(n-i,k-2)$ possible ways in which $P$ can partition $[m]$ into $k-2$ parts. The maximality of $m$ means that $m+1$ must not be in any of the $k-2$ parts into which $P$ splits $[m]$, so $P$ partitions $[m+1]$ into $k-1$ parts. That leaves the $i-1$ members of $[n]\setminus[m+1]$ to be distributed amongst the $k$ parts of $P$. There are $k^{i-1}$ ways to do this, but those include the ones that distribute all $i-1$ elements to the $k-1$ parts that we already have, and we don’t want those: we need to get at least one of these $i-1$ elements into the $k$-th part. There are $(k-1)^{i-1}$ of these unwanted distributions, so there are actually just $k^{i-1}-(k-1)^{i-1}$ ways in which $P$ can divide up the remaining $i-1$ elements of $[n]$. Thus, there are $S(n-i,k-2)\left(k^{i-1}-(k-1)^{i-1}\right)$ partitions of $[n]$ into $k$ parts in such a way that $P$ partitions $[n-i]$ into $k-2$ parts and $[n-i+1]$ into $k-1$ parts. Summing over the possible values of $i$ yields the desired result.