I pick a random subset $S$ of $\{1,\ldots,N\}$, and you have to guess what it is. After each guess $G$, I tell you the number of elements in $G \cap S$. How many guesses do you need?
2026-05-14 09:56:51.1778752611
On
Guessing a subset of $\{1,...,N\}$
1.8k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
9
On
An obvious upper bound is $N$ queries, since you can test each element individually. On the other hand, it takes at least $\Omega(N/\log N)$ queries: $N$ bits of information are required to identify the target subset, and each query can yield at most $O(\log N)$ bits of information, since each query has only $O(N)$ possible answers. To see that the upper bound is not sharp, consider the following strategy for $N=5$, which takes at most $4$ queries:
- Guess $\{1,2,3,4,5\}$. If the result is $0$ or $5$, we have the answer. If the result is $1$ or $4$, bisection search (for the single member or the single missing element) gives the answer in three more queries. Suppose the result is $2$ (the strategy for $3$ is the same by symmetry).
- Guess $\{1,2\}$. If the result is $2$, we have the answer. If the result is $0$, then bisection search on $\{3,4,5\}$ (for the single missing element) gives the answer in two more queries. Suppose the result is $1$. Then we know the answer is $\{a,b\}$ for some $a \in \{1,2\}$ and $b \in \{3,4,5\}$.
- Guess $\{1,3\}$. If the result is $2$, we have the answer. If the result is $0$, then the answer is $\{2,b\}$ for some $b\in\{4,5\}$, and one more query gives the answer. Suppose the result is $1$. Then we know the answer is $\{1,4\}$, $\{1,5\}$, or $\{2,3\}$.
- Guess $\{1,4\}$. The answer is $\{1,4\}$ if the result is $2$, or $\{1,5\}$ if the result is $1$, or $\{2,3\}$ if the result is $0$.
This example gives an improved upper bound asymptotic to $4N/5$. It seems likely that the correct answer is strictly $o(N)$ (i.e., eventually less than $cN$ for any fixed $c$), but whether or not it's $\Theta(N/\log N)$, I can't say.
This can be solved in $\Theta(N/\log N)$ queries. First, here is a lemma:
Proof: Divide $\{1,\dots,2N+Q-1\}$ into three sets, $A,B$ and $C$, where $|A|=|B|=N$ and $|C|=Q-1$. By assumption, there exist subsets $A_1,\dots,A_{Q-1}$ such that you could find the unknown subset of $A$ alone by first guessing $A$, then guessing $A_1,\dots,A_{Q-1}$. Similarly, there exist subsets $B_1,\dots,B_{Q-1}$ for solving $B$. Finally, write $C=\{c_1,\dots,c_{Q-1}\}$.
The winning strategy is:
Guess the entire set, $\{1,\dots,2N+Q-1\}$.
Guess $B$.
For each $i\in \{1,\dots,Q-1\}$, guess $A_i\cup B_i$.
For each $i\in \{1,\dots,Q-1\}$, guess $A_i\cup (B\setminus B_i)\cup \{c_i\}$.
Using the parity of the the sum of the guesses $A_i\cup B_i$ and $A_i\cup (B\setminus B_i)\cup \{c_i\}$, you can determine whether or not $c_i\in S$. Then, using these same guesses, you get a system of equations which lets you solve for $|A_i \cap S|$ and $|B_i\cap S|$ for all $i$. This gives you enough info to determine $A\cap S$ and $B\cap S$, using the assumed strategy.$\tag*{$\square$}$
Let $\def\Opt{\operatorname{Opt}}\Opt(N)$ be the fewest number of guesses you need for $\{1,\dots,N\}$. Using the lemma and induction, you can show that $$ \Opt(k2^{k-1}+1)\le 2^k\qquad \text{for all }k\in \{0,1,2,\dots\} $$ More specifically, you can prove
Proof: We prove this by induction. During the inductive step, we will assume the truth of the theorem whenever $n\le k2^{k-1}$, and use this to prove the theorem holds whenever $n\le (k+1)2^k$.
Given $n,k$ with $n\le (k+1)2^k+1$, let $$ n' = \text{floor}\big((n-2^k+1)/2\big) $$ Combining this last inequality with $n\le (k+1)2^k+1$, we get $$ n'\le ((k+1)2^k+1-2^k+1)/2 = k2^{k-1}+1, $$ so we may apply the induction hypothesis to $n'$, to conclude $\Opt(n')\le 2^k$. Finally, using the Lemma, $$ \Opt(n)=\Opt(2n' + 2^k-1)\le 2\cdot 2^k=2^{k+1}. $$ This completes the proof by induction. $$\tag*{$\square$}$$
Finally, we conclude
Corollary $\Opt(N)\in O(N/\log N)$.
Proof: Think of $k$ as a function of $N$, where $k$ is the unique integer for which $$ k2^{k-1}+1<N\le (k+1)2^k+1 $$ These two inequalities imply that $\log_2 N\sim k$, where $f(N)\sim g(N)$ means that $\lim_{N\to\infty} g(N)/f(N)=1$. Therefore, $$ \Opt(N)\le \frac{2^{k+1}\cdot k}{k}\sim 4\cdot \frac{k2^{k-1}}{\log_2 N}\le 4\cdot \frac{N}{\log_2 N}.\tag*{$\square$} $$
The previous entropy argument already proved the matching lower bound, so we can safely say that $\Opt(N)=\Theta(N/\log N)$. More specifically, $$ 2 \frac{N}{\log_2 N} \lesssim \Opt(N)\lesssim 4\frac{N}{\log_2 N}, $$ where $f(N)\lesssim g(N)$ means $f(N)\le (1+\epsilon) g(N)$ for all $\epsilon>0$ and all $N:= N(\epsilon)$ sufficiently large.