I see that the below formula is the explicit formula of the Stirling numbers of the second kind. I know that the Stirling number of the second kind is the number of ways to partition set of $n$ objects into $k$ non-empty subsets. But, I don't at all see from where the below formula comes from. Clearly, there is some kind of inclusion-exclusion going on, but I cannot figure it out. If someone can give me some kind of combinatorial interpretation of the formula, I would be glad.
$$ \begin{Bmatrix} n\\ k \end{Bmatrix} =\frac{1}{k!}\sum_{j=0}^{k}(-1)^{k-j}\binom{k}{j}j^n$$
$\begin{Bmatrix} n\\ k \end{Bmatrix}$ is the number of ways to distribute $n$ distinct objects into $k$ identical boxes with no empty boxes allowed/each box containing at least $1$ object. We will first consider distributions into $k$ distinct boxes.
The total number of ways to distribute $n$ distinct objects into $k$ distinct boxes with empty boxes allowed is: $$k^n$$
Let $A_i$ be the number of ways to perform the distribution with the condition that box $i$ must be an empty box. There are no restrictions on the other boxes. (This is equivalent to distributing the $n$ objects into the remaining $k-1$ boxes, empty boxes allowed.)
For example $A_1$ is the number of ways to distribute the objects into the $k$ boxes with box 1 being empty and is equivalent to distributing the objects into the remaining $k-1$ boxes, empty boxes allowed.
$$\implies |A_1| = (k-1)^n $$
Now we notice that $|A_1|=|A_2|=|A_3|=\dots$ since it doesn't matter which box is empty. (We are always distributing into the remaining $k-1$ boxes).
We know there are $\binom{k}{1}$ possible $A_i $
$$\sum_{i=1}^k|A_i| = \binom{k}{1}(k-1)^n$$
Using a similar argument, $|A_i\cap A_j|=(k-2)^n$ because it involves distribution into $k-2$ remaining boxes. Also, using the fact that $|A_1\cap A_2|=|A_1\cap A_3|=|A_1\cap A_4|...$ and that there are $\binom{k}{2}$ such terms,
$$\sum_{1\leq i,j\leq k}|A_i\cap A_j| = \binom{k}{2}(k-2)^n$$
On a similar note, we try extending this to when $l$ boxes must be empty:
$$\sum_{1\leq i,j\dots l\leq k}|A_i\cap A_j\cap \dots\cap A_l| = \binom{k}{l}(k-l)^n$$
$|A_1\cup A_2\cup \dots \cup A_k|$ is the number of ways to distribute $n$ distinct objects into $k$ distinct boxes with at least one empty box.
We are trying to find the distribution with no empty boxes: $$|\overline{A_1\cup A_2\cup \dots \cup A_k}|=k^n -|A_1\cup A_2\cup \dots \cup A_k|$$
By the principle of inclusion and exclusion,
$$=k^n-(|A_1|+|A_2|+|A_3|+...)+(|A_1\cap A_2|+|A_1\cap A_3|+|A_1\cap A_4|+...)-...$$
$$=k^n-\binom{k}{1}(k-1)^n+\binom{k}{2}(k-2)^n-\binom{k}{3}(k-3)^n+\dots +(-1)^l\binom{k}{j}(k-j)^n$$
$$=\sum_{l=0}^{k}(-1)^l\binom{k}{l}(k-l)^n$$
Let $j=k-l$, when $l=0$, $j=k$. When $l=k$, $j=0$.
$$=\sum_{j=k}^{0}(-1)^{k-j}\binom{k}{k-(k-j)}j^n (\text{this is technically wrong notation})$$
This is just the sum backwards. So we have:
$$=\sum_{j=0}^{k}(-1)^{k-j}\binom{k}{j}j^n$$
Now we divide by $k!$ since we are dealing with identical boxes:
$$\begin{Bmatrix} n\\ k \end{Bmatrix} =\frac{1}{k!}\sum_{j=0}^{k}(-1)^{k-j}\binom{k}{j}j^n$$