Optimal strategy for guessing subset of given size

70 Views Asked by At

(inspired by this question)

Let $X$ be a set of size $n$. Suppose $A\subseteq X$ is a subset of size $a$ chosen uniformly randomly. We are given $a,k$ and want to choose $B\subseteq X$ to maximise $$ P(|A\triangle B|\leq k) $$ where $A\triangle B$ is the symmetric difference of $A$ and $B$. By symmetry this depends only on $b=|B|$.

Given $n,a,k$, which value of $b$ maximizes $P(|A\triangle B|\leq k)$?

1

There are 1 best solutions below

0
On BEST ANSWER

Let $p(n,a,b,k)=P(|A\triangle B|\leq k)$. Note that $A\triangle B=(X\setminus A)\triangle(X\setminus B)$, so $$ p(n,a,b,k)=p(n,n-a,n-b,k). $$ Therefore we may assume $0\leq a\leq n/2$. If $k\geq a$ then $b=0$ is optimal since $p(n,a,0,k)=1$. Now assume $k<a$.

Note that $|A\triangle B|$ has the same parity as $a+b$. If $a+b+k$ is odd then $$ |A\triangle B|\leq k\Leftrightarrow|A\triangle B|\leq k-1 \Rightarrow|A\triangle(B\triangle\{x\})|\leq k $$ for any $x\in X$. Thus $p(n,a,b,k)\leq p(n,a,b\pm1,k)$. Therefore the optimal value of $b$ must have the same parity as $a-k$. If $b<a-k$ then $p(n,a,b,k)=0$, so the optimal value must be at least $a-k$. Suppose $b=a-k+2l$ where $l\geq0$. Note that $$ |A\triangle B|\leq k\Leftrightarrow|B\setminus A|\leq l. $$ Suppose $B\subseteq B'$ with $|B'|=b+2$. There are three possibilities:

  1. $|B\setminus A|=l$ and $|B'\setminus A|=l+2$.
  2. $|B\setminus A|=|B'\setminus A|=l+1$.
  3. Otherwise, $|B\setminus A|\leq l\Leftrightarrow|B'\setminus A|\leq l+1$.

Note that in cases 1 and 2, $|(X\setminus B')\cap A|$ equals $a-b+l$ and $a-b+l-1$ respectively. The number of sets $A$ in cases 1 and 2 is $$ \binom{b}{l}\binom{n-b-2}{a-b+l}\text{ and } \binom{b}{l+1}\binom{n-b-2}{a-b+l-1} $$ respectively. Thus $$ p(n,a,b,k)-p(n,a,b+2,k)= \frac{\binom{b}{l}\binom{n-b-2}{a-b+l}- \binom{b}{l+1}\binom{n-b-2}{a-b+l-1}}{\binom{n}{a}}. $$ Note that $a-b+l=k-l$. If $l\geq k$, we immediately have $p(n,a,b,k)\geq p(n,a,b+2,k)$. Suppose $l<k$. Then we obtain $$ p(n,a,b,k)-p(n,a,b+2,k)= \frac{\binom{b}{l}\binom{n-b-2}{k-l-1}}{\binom{n}{a}} \frac{k^2-a(k+1)+l(n-2k-2)+n-1}{(l+1)(k-l)}. $$ If $k+1=a=n/2$ then $p(n,a,b,k)=p(n,a,b+2,k)$. Otherwise $k+1<n/2$ and $$\begin{eqnarray*} p(n,a,b,k)&<&p(n,a,b+2,k)\\ \Leftrightarrow l(n-2k-2)&<&a(k+1)-k^2-n+1\\ \Leftrightarrow (l+1)(n-2k-2)&\leq&(a-k-1)(k+1)-1\\ \Leftrightarrow (l+1)&\leq&\frac{(a-k-1)(k+1)-1}{n-2k-2}\\ \Leftrightarrow (l+1)&\leq&\left\lfloor \frac{(a-k-1)(k+1)-1}{n-2k-2}\right\rfloor. \end{eqnarray*}$$ Note that the RHS is at most $k$ since $a-k-1\leq2(a-k-1)\leq n-2k-2$. Putting together all of the above, the optimal $b$ given $n,a,k$ is $$ b(n,a,k)=\begin{cases} n-b(n,n-a,k)&\text{if }a>n/2,\\ 0&\text{if }k\geq a,\\ a-k&\text{if }k=a-1,\\ a-k+2\left\lfloor \frac{(a-k-1)(k+1)-1}{n-2k-2}\right\rfloor&\text{otherwise}. \end{cases} $$ (when there is a tie, the above formula prefers the lower $b$ for $a\leq n/2$).