Let $m,n$ be two integers such that, $m\ge n$. Compute the number of surjections from $\{1,...,m\}$ to $\{1,...,n\}$
There are $n^m$ functions (total).
we subtract from $n^m$ the number of non-surjective functions.
There are $\binom{n}{1}(n-1)^m$ functions missing one element.
There are $\binom{n}{2}(n-2)^m$ functions missing two elements, but how many times did we count this in the previous count ?
Then we have to add this again by inclusion-exclusion but why is the difference always 1?
For all $1 \le i \le n$ define $$A_i = \{\rho \colon M \to N \text{ a mapping such that } i \notin \rho(M)\}.$$ Then $$S = \{\rho \colon M \to N\} \setminus \bigcup_{i=1}^n A_i$$ is the set of surjections.
By the principle of inclusion and exclusion, we may calculate $$|S| = |\{\rho \colon M \to N\}| - \sum_{1 \le i_1 \le n} |A_i| + \sum_{1 \le i_1 < i_2 \le n} |A_{i_1} \cap A_{i_2}| - \dots \pm |A_1 \cap \dots \cap A_n|.$$ Now $|\{\rho \colon M \to N\}| = n^m$. We also have $$A_{i_1} \cap \dots \cap A_{i_k} = \{\rho \colon M \to N \text{ a mapping such that } \{i_1,\dots,i_k\} \cap \rho(M) = \emptyset\},$$ so that $|A_{i_1} \cap \dots \cap A_{i_k}| = (n-k)^m$. Each sum contains ${n \choose k}$ terms, so we have $$|S| = n^m - {n \choose 1}(n-1)^m + {n \choose 2}(n-2)^m - \dots \pm {n \choose n}(n-n)^m.$$