Is it possible to find something better than binary search for this problem?

53 Views Asked by At

Let's say we have $n$ urns (numbered $1$ through $n$) and the first $k$ urns have a ball in them (for some $k$ unknown to us) and the remaining urns are empty. Our goal is to determine $k$ by looking into as few urns as possible.

One method would be to use a binary search - look into the urn with index $\left\lfloor \frac{n}{2}\right\rfloor$ and rule out about one half of the urns depending on whether that urn was empty or not, then repeat the process on the remaining half. This allows us to find $k$ in at most $\left\lfloor \log_2(n)\right\rfloor+1$ steps.

The problem is that even in the best case we need $\left\lfloor \log_2(n)\right\rfloor$ steps to find $k$: The only way to save a step is when we're left with only two urns and the one with the lower index is empty, then the other urn must also be empty.

My question is: Is there a way to find $k$ in fewer steps on average than with binary search, if we assume that $k$ is chosen uniformly at random? If not, then what if instead $k$ was chosen according to a probability distribution that we had prior knowledge of, could we exploit that to do better on average?

Thanks in advance for any help!

1

There are 1 best solutions below

0
On BEST ANSWER

If $k$ is chosen uniformly at random, you can’t do better than binary search, since the binary entropy of $k$ is $\log_2n$ and you can’t determine its value with fewer answers to binary questions (i.e. bits) than that.

If $k$ is chosen according to some other known distribution, generally speaking the optimal algorithm is binary search on the distribution, i.e. in each step choose an urn that divides the remaining sum of probabilities as evenly as possible between the two possible outcomes.

It’s not always optimal to follow this prescription precisely, since there are cases where allowing for a small additional imbalance in one step allows you to avoid a larger imbalance in future steps. For example, if the remaining probabilities are very close to $\frac15,\frac15,\frac15,\frac15,\frac1{10},\frac1{10}$, you’d want to include the $\frac15$ in the middle in the second half, since that would allow you to divide exactly evenly in the next step, whereas if you include it in the first half the first half can’t be divided evenly. So if you want to find the precise optimal strategy you might have to do some further optimization using dynamic programming; but always dividing the sum as evenly as possible will be close to optimal.