I am reading the book "A walk through combinatorics" and most proofs so far have had clear definitions and statements, on the chapter of the inclusion-exclusion theorem the author proves the regular formula presented in most introductory probability courses (at least the one I took had it) $$|A_{1}\bigcup...\bigcup A_{n}| = \sum_{j=1}^{n}(-1)^{j-1}\sum_{i_1...i_j}|A_{i_1}\bigcap...\bigcap A_{i_j}|$$ where $\{ i_1...i_j \}$ range over all $j$-subsets of $[n]$, and the proof is all fine and dandy like the rest of textbook so far.
But then at the end of the chapter he drops a bombshell of a theorem out of nowhere and claims it's just an alternative version of this formula, and the proof is bonkers I can't understand beyond the first few statement:
Theorem 7.6 Let $f$ and $g$ be functions that are defined on the subsets of $[n]$, and whose range is the set of real numbers. Let us assume that $f$ and $g$ are connected by $g(S)=\sum_{T \subseteq S}f(T)$ then $f(S)=\sum_{T \subseteq S}g(T)(-1)^{|S-T|}$
now if this statement is not enigmatic enough (this is the first time he even mentions about real numbers!) the proof that follows make it worse:
Proof. If we express $g(T)$ by values of $f$ on the right-hand side of the conclusion ($f(S)=\sum_{T \subseteq S}g(T)(-1)^{|S-T|}$), we see that for all $U ⊆ S$, the value $f(U)$ will appear once for each set $T$ satisfying $U ⊆ T ⊆ S$. Each such appearance of $f(U)$ will be counted with a sign given by $(−1)^{|S−T|}$ . The number of such subsets $T$ for which $|S − T| = i$ is equal to ${|S-U|}\choose{i}$ , since $T$ is determined by the elements of $S$ that are not in $T$ , and $T$ contains $U$. Therefore, $f(U)$ will appear on the right-hand side of the conclusion exactly $\sum_{i=0}^{|S-U|}(-1)^i$${|S-U|}\choose{i}$$=(1-1)^{|S-U|}$ times. This number is always zero, except when $|S − U| = 0$, that is, when $S = U$. So the only term on the right-hand side that does not cancel out will be $f(S)$, and the claim is proved.
Ok, now maybe even if you understood the proof you may understand why I am in complete confusion, the first part of the proof I understood as saying that $$f(S)=\sum_{T \subseteq S} \sum_{U \subseteq T}f(U) (-1)^{|S-T|}$$ then it makes sense that $f(U)$ is counted once for each time $U\subseteq T \subseteq S$ with the sign $(−1)^{|S−T|}$, but then it all starts falling apart, the argument that follows doesn't even make sense because $U$ is not even defined to be anything else except a subset of $T$, and this could have varying sizes! After that I do understand that the binomial theorem is used and the conclusion that follows. So now writing this I realize the only thing I don't understand is from where the heck did that ${|S-U|}\choose{i}$ appear from?!
Oh yeah, and how exactly is this related to the exclusion-inclusion theorem you probably even forgot was how we started with this whole thing?
You asked 2 questions.
First, about the $\binom{|S-U|}{i}$: as it says in the proof you wrote down,
I assume you are asking why this is true. Note that choosing a set $T$ with $U \subseteq T \subseteq S$ is the same as choosing a subset of $S - U$ (you are choosing the extra elements of $T - U$). If $|S-T| = i$, then this means $$ |T - U| = |S - U| - |S - T| = |S-U| - i $$ So there are $\binom{|S-U|}{|T-U|} = \binom{|S-U|}{|S-U|-i}$ ways to choose the elements of $T-U$. But binomial coefficients are symmetric (i.e. $\binom{n}{k} = \binom{n}{n-k}$), so this is the same as $\binom{|S-U|}{i}$.
Second, for your other question: why is this equivalent to the "usual" formulation of inclusion/exclusion? Let's use a slightly different (but equivalent) form of inclusion/exclusion, where instead of counting $A_1 \cup A_2 \cup \cdots \cup A_n$, we cound the complement of this set. If all the $A_i$'s are subsets of some big set $X$, we can count the complement of the $A_i$'s: $$ \left| X - \bigcup_{i=1}^n A_i \right| = \sum_{I \subseteq [n]} (-1)^{|I|} \left| \bigcap_{i \in I} A_i \right| $$
For each element $x \in X$, it is in some of the $A_i$'s and not in others. Let $I(x) = \{i_1,i_2,\dots,i_k\}$ be the indices of the $A_i$'s which do not contain $x$. That is, $x \in A_i$ if and only if $i \not \in I(x)$. Define the function $f$ by saying $f(S)$ is the number of $x$'s for which $I(x) = S$. Then in the special case that $S = [n] = \{1,2,\dots,n\}$, we have $f([n]) = \left| X - \bigcup_{i=1}^n A_i \right|$, which is the left-hand side of the inclusion/exclusion formula.
For a set $S$, consider the intersection $\bigcap_{i \in S} A_i$. An element $x$ in this intersection must be in all of the $A_i$'s with $i \in S$ (by definition of "intersection"). But $x$ might also be in other $A_i$'s for $i \not \in S$. In other words, $x \in \bigcap_{i \in S} A_i$ is equivalent to saying $I(x) \subseteq [n] - S$. Summing over the possible $I(x)$'s, you see that $g([n] - S) = \left| \bigcap_{i \in S} A_i \right|$. Substituting this for $g$ gives the right-hand side of the inclusion/exclusion formula.