I was given the following question in a course on the probabilistic method. I have now been struggling with it for a couple of days and didn't make any real progress. This is the question:
Prove that for sufficiently large $n$, if $S_1, S_2, \ldots, S_k \subseteq \{1, 2, \ldots, n\}$ and $k\le \frac{1.99n}{\log_2{n}}$ then we can find some $X,Y \subseteq \{1, 2, \ldots, n\}$ so that $X\ne Y$ and for all $1\le i \le k$ we have $|X\cap S_i| = |Y\cap S_i|$.
I noted that we can assume that $\bigcup_{i=1}^{k}S_i = \{1, 2, \ldots, n\}$ because if not we can simply take $X = \emptyset, Y = \{1, 2, \ldots, n\} \setminus \bigcup_{i=1}^{k}S_i$ and then $|X\cap S_i| = |Y\cap S_i|=0$.
If we choose at random some $X,Y \subseteq \{1, 2, \ldots, n\}$ then for a fixed $i$ we have $$\mathrm{Pr}(|X\cap S_i| = |Y\cap S_i|) = {\binom{2|S_i|}{|S_i|} \over 2^{2|S_i|}} $$ But I could not calculate, or at least bound from below the probability that it happens for all $i$ since I do not know how the $S_i$ intersect.
One thing I thought of is that from $k\le \frac{1.99n}{\log_2{n}}$ we know that $2^{2n/k} \ge n^{2/1.99} > n$ which suggests we might divide ${1, 2, \ldots, n}$ into disjoint sets of size $k$: $I = \{A_1, A_2, \ldots, A_{n/k}\}$, and then look at some function $f\colon\{1, 2, \ldots, n\} \to \mathcal P(I) \times \mathcal P(I)$ which cannot be onto, but I don't know what $f$ to take.
So basically I did not make any progress with this problem. Any help will be appreciated.
We would like to apply the pigeonhole principle to this problem. Unfortunately, at the moment, the number of possible values of $(|S_1 \cap X|, |S_2 \cap X|, \dots, |S_n \cap X|)$ is bounded at best by $$(|S_1|+1) \cdot (|S_2|+1) \cdot \dots \cdot (|S_k| + 1)$$ which could be much larger than $2^n$.
To fix this, we first apply the probabilistic method to show that for most choices of $X$, the value of $|S_i \cap X|$ lies in a much narrower interval for all $i$. Then we can discard lots of unlikely "pigeonholes" while losing only a few "pigeons", and after this, the pigeonhole principle will apply.
The general idea at work is a very useful trick to learn, and you will learn better if you stop reading this answer and try this yourself right now.
If we choose $X$ at random, $|S_i \cap X|$ is a binomial random variable with mean $|S_i|/2$, and by a standard large deviation bound, $$\Pr\left[\Big||S_i \cap X| - |S_i|/2\Big| \ge t\right] \le 2 \exp(-2t^2/|S_i|) \le 2 \exp(-2t^2/n).$$ Set $t = \sqrt{n \log n}$, and we get a bound of $2/n^2$. So with probability $1 - k \cdot \frac{2}{n^2} > \frac12$, $|S_i \cap X|$ lies in an interval of size $2 \sqrt{n \log n}$ for every $i$. From here on, we restrict ourselves to just those subsets of $\{1,2,\dots,n\}$ for which this bound holds for each $i$: there are at least $2^{n-1}$ of these.
If $X$ is such a subset, then the vector of values $(|S_1 \cap X|, |S_2 \cap X|, \dots, |S_n \cap X|)$ can have at most $(2 \sqrt{n \log n})^k$ different values. So we will have more pigeons than pigeonholes if we show that $$(2 \sqrt{n \log n})^k < 2^{n-1} \quad \Longleftrightarrow \quad k\, \log_2 (2 \sqrt{n \log n}) < n-1.$$ If we forget about all the less-significant terms of this, this is saying that $\frac12 k \log_2 n < n$ or $k \le \frac{2n}{\log_2 n}.$ All the less-significant terms can be rounded up at the cost of making the $2$ into the $1.99$, so this inequality follows from the given bound of $k$.
Once we have this inequality, the pigeonhole principle applies, so we can choose two subsets $X, Y \subseteq \{1,2,\dots,n\}$ that fall into the same pigeonhole, and therefore have $|X \cap S_i| = |Y \cap S_i|$ for all $i$.