There are two problems I am trying to solve. The first is a special case of the second, but I shall include it anyway as it possibly provides some insight.
- Let $0<\lambda\le 1$ and let $X$ be a set of $n$ elements. Fot $i=1,2,\dots ,m$, let $A_i$ and $B_i$ be disjoint subsets of $X$, such that $$\mid A_i\mid +\mid B_i\mid\le \lambda n$$ for every $i$. We say that the collection $$\Sigma=\lbrace (A_i,B_i):i=1,\dots,m\rbrace\subseteq \mathcal{P}(X)\times\mathcal{P}(X) $$ of pairs $separates$ the elements of $X$ if for all $x,y\in X$ with $ x\ne y$, there is some index $i$ such that $x$ is in one of $A_i$ or $B_i$, and $y$ is in the other. Show that if $X$ separates the elements of $X$ then $$\mid \Sigma \mid = m\ge \frac{\log_2n}{\lambda}$$
My attempt: I think I should be able to provide a probabilistic argument.
Suppose $\Sigma$ separates the elements of $X$. Then, consider deleting either $A_1$ or $B_1$ from $X$ at random, and then continuing be deleting $C_i=A_i$ or $B_i$ from what remains for all $i\le m$. We are left with (at most) a single element (else $\Sigma$ does not separate the elements of $X$).
The expected proportion of elements removed at each step is $\frac{\lambda}{2}$, and so the expected number of elements remaining is $(\frac{\lambda}{2})^mn$. Realising that this is at most 1, we get a relation similar to what we are tring to show, but the $\lambda$ is in the $\log$, giving a weaker bound.
The second question relaxes the condition $$\mid A_i\mid +\mid B_i\mid\le \lambda n$$ and replaces it with the condition $$\Sigma_{i=1}^m (\mid A_i\mid +\mid B_i)\mid\le \lambda mn$$
That is instead of the each pair having a $\lambda n$ elements, we just have that each pair on average has at most $\lambda n$ elements. I actually don't think this changes the result from part 1.
Thanks
Information theory background
The basic tool of this approach is the entropy function: given a random variable $\mathbf X$ with values $\{x_1, \dots, x_n\}$, its entropy is $$ H(\mathbf X) = \sum_{i=1}^n \Pr[\mathbf X = x_i] \log_2 \frac1{\Pr[\mathbf X = x_i]}. $$ Intuitively, this measures the average amount of bits it takes to transmit $\mathbf X$ by an optimal encoding (up to technical details). These notes summarize the notions we will need for this solution:
The Wikipedia links are also a fine reference.
I will write $h(q)$ for the function $q \log_2 \frac1q + (1-q) \log_2 \frac1{1-q}$: the entropy of a Bernoulli variable with probability $q$. (Conventionally but irrelevantly for us, $h(0) = h(1) = 0$.)
Solution
Choose a random element $\mathbf x \in X$, and for each $i = 1, \dots, m$, choose a bit $\mathbf S_i$ as follows:
The following observations about these random variables will be relevant:
In the language of information theory, the first observation tells us that $H(\mathbf S_1, \dots, \mathbf S_m) = H(\mathbf S_1, \dots, \mathbf S_m, \mathbf x)$; the second observation tells us that $H(\mathbf S_i \mid \mathbf S_1, \dots, \mathbf S_{i-1}, \mathbf x) = H(\mathbf S_i \mid \mathbf x)$.
So we have \begin{align} H(\mathbf S_1, \dots, \mathbf S_m) &= H(\mathbf S_1, \dots, \mathbf S_m, \mathbf x) && \text{Obs. 1}\\ &= H(\mathbf x) + \sum_{i=1}^m H(\mathbf S_i \mid \mathbf S_1, \dots, \mathbf S_{i-1}, \mathbf x) && \text{Chain rule} \\ &= H(\mathbf x) + \sum_{i=1}^m H(\mathbf S_i \mid \mathbf x). && \text{Obs. 2} \end{align} On the other hand, $H(\mathbf S_1, \dots, \mathbf S_m) \le \sum_{i=1}^m H(\mathbf S_i)$ by the subadditivity of joint entropy, so we get $$ H(\mathbf x) \le \sum_{i=1}^m \left(H(\mathbf S_i) - H(\mathbf S_i \mid \mathbf x)\right). $$ We can put an upper bound on each $H(\mathbf S_i) - H(\mathbf S_i \mid \mathbf x)$ individually. We have $H(\mathbf S_i) = h(\frac{|A_i|}{|A_i| + |B_i|})$, because $\mathbf S_i$ is a Bernoulli random variable with that success probability. Meanwhile, $H(\mathbf S_i \mid \mathbf x) = h(\frac{|A_i|}{|A_i| + |B_i|}) \cdot \frac{n - |A_i| - |B_i|}{n}$: with probability $\frac{n - |A_i| - |B_i|}{n}$, $\mathbf x \notin A_i \cup B_i$, in which case $\mathbf S_i$ is randomly chosen with probability $\frac{|A_i|}{|A_i| + |B_i|}$; for other values of $\mathbf x$, $\mathbf S_i$ is constant. Therefore $$ H(\mathbf S_i) - H(\mathbf S_i \mid \mathbf x) = h(\tfrac{|A_i|}{|A_i| + |B_i|}) \cdot \frac{|A_i| + |B_i|}{n} \le \frac{|A_i| + |B_i|}{n} $$ by the bound $h(q) \le 1$ for all $q \in [0,1]$.
We can also compute $H(\mathbf x) = \log_2 n$, since $\mathbf x$ is uniform with $n$ outcomes. So we have $$ \log_2 n \le \sum_{i=1}^m \left(\frac{|A_i| + |B_i|}{n}\right) \le \lambda m $$ which means $m \ge \frac{\log_2 n}{\lambda}$, as desired.