Optimal algorithm to guess any random integer without limits?

5.7k Views Asked by At

Guessing Game In Range $[1, n]$

The classical guessing game goes something like this...

  1. Our friend thinks of an integer between $1$ and $100$ (let's say they pick $42$).

  2. We try to guess that number with the fewest guesses possible: $$ 100, 3, 7, 30, ... $$

  3. But every time we guess, they say whether the actual number is higher or lower than what we guessed... $$ 100(lower), 3(higher), 7(lower), 30(lower), ... $$

There is actually an optimal solution to this problem. By knowing whether their number is higher or lower than our previous guess, we can always choose our next guess in exactly the middle of the remaining range... $$ \begin{array}{|c|c|c|c|c|} \hline \text{#} & \text{Options Range} & \text{Number of Options} & \text{Guess} & \text{42 Is Higher Or Lower} \\ \hline 1. & 1...100 & 100 & 50 & \text{Lower} \\ \hline 2. & 1...49 & 49 & 25 & \text{Higher} \\ \hline 3. & 26...49 & 24 & 37 & \text{Higher} \\ \hline 4. & 38...49 & 12 & 43 & \text{Lower} \\ \hline 5. & 38...42 & 5 & 40 & \text{Higher} \\ \hline 6. & 41...42 & 2 & 41 & \text{Higher} \\ \hline 7. & 42...42 & 1 & 42 & \text{Win} \\ \hline \end{array} $$

By using this method we are reducing the number of remaining options by roughly a half with each guess. It took us only $7$ guesses to guess a random number from $1$ to $100$.

Infinite Guessing Game $[1, \infty)$

What if we ask our friend to pick an integer $k$ between $1$ and infinity?

Is there an algorithm which can be proven to be the most efficient in terms of the average number of guesses for finding the answer?

Can it be proven that there can not exist such an algorithm?

My guess is that we need to solve this problem in two separate steps:

  1. Find an upper-bound $n$ that is higher than $k$.

  2. Solve the original problem in the range $[1,n]$.

If we had all the time in the universe, and because the number $k$ is a finite number, we will guess it eventually.

We can even define some possible ways of finding $n$:

  • Counting from 1 upwards. $1, 2, 3, 4, ...$ (Takes exactly $k$ guesses to reach the answer.)
  • Guessing a random number in the sub-range of possibilities. (How to define selecting a random number between $1$ and infinity?)
  • Squaring our guess until we find an upper bound. $10, 100, 10^4, 10^8, 10^{16}, ...$
  • Cubing our guess until we find an upper bound? $10, 1000, 10^9, 10^{27}, ...$ (What if we raise to an even higher power?)
  • Exponentially increasing our guess until we find an upper bound. $10, 2^{10}, 2^{2^{10}}, ...$

We could keep defining different techniques like these forever.

The questions are:

  1. Are all these techniques equally valid?
  2. If not, how can they be ordered in terms of efficiency?
  3. If the number $k$ can be anywhere between $1$ and $k$, does this automatically mean that the average number of guesses for any technique tends to infinity?
  4. The problem is not only finding the upper-bound as quickly as possible, but also finding it in a way that after we have found it we can find the number itself as quickly as possible.

Double-Infinite Guessing Game $(-\infty, \infty)$

You know the question to this...

3

There are 3 best solutions below

2
On BEST ANSWER

A practical approach to this is to start with $n=1$ (or $n=$ any positive integer you like), and double your guess until you reach an $n\ge k$. Then you can conduct your binary search on the remaining possibilities $n/2 < k < n$.

This can't be proven to be the most efficient method, because to do that you would need to know the probability of each $k$, which presumably you don't. But it is an approach I have used in more than one computer programming problem.

By the way, your Doubly Infinite Guessing Game can be reduced to the Singly Infinite version, simply by guessing $n=0$ at your first turn.

0
On

Without knowing the distribution, an approach of "find a finite upper bound to reduce it to a finite guessing game problem" (and I don't think there's a better approach) won't get an algorithm better than logarithmic with respect to the chosen number. Suppose you have an magical algorithm which finds you a finite upper bound right away in O(1). You've now reduced the problem to the finite problem, which is logarithmic. This means that a "find the upper bound" approach of "1->2->4->8->16->32->etc" will take twice as many guesses as a "you already know the upper bound" approach, and still be O(log k).

This reduces the problem to "I have an infinitely long sequence of powers of 2, find the one I'm thinking of", and you can guess the 1st, 2nd, 4th, 8th, etc. numbers in the sequence there (i.e. 1, 2, 8, 128, etc), and you'd reduce the find the upper-bound problem to log(log(k)). You Any further optimization there wouldn't reduce your total runtime measurably (since the ratio of part 2's runtime to part 1's runtime approaches infinity with this approach) , but you can theoretically repeat that process as much as you like until your first 3 guesses are "1, 2, infinity".

If you do know the distribution, you can pretend it's a finite distribution and do a binary search based on the median of the remaining part of the domain (start on the median to reduce the problem space by half, then take the median of the chosen half, etc). The algorithm's complexity will be O(bits_of_entropy), which is to say also logarithmic in the worst case.

0
On

This task appeared on my table around 2010 and it was even in a more extreme form:

  1. the number to be guessed was from real domain, not from the integer
  2. the number to be guessed was changing a bit during guessing!

the second requirement was making the life harder. Especially for the algorithms, which are explicitly relying on some lower and upper bounds to catch the conceived real number.

In 2022 I've returned to this task and explored it together with Maxim Demidovich. The task appeared to be much more serious than I thought!

Let formalize our problem like this:

let $h_t$ be our unknown search parameter,

$x_t$ be our estimate of the sought parameter $h_t$ at step $t$,

$u_t$ be the response of the environment to our estimate $x_t$ of the parameter $h_t$ ("$-1$" - undershoot, "$1$" - overshoot):

$$ \begin{equation}u_{t+1}=\begin{cases}-1, &x_t<h_t\\0,&x_t=h_t\\1, &x_t≥0\end{cases}\end{equation} $$

Then here is our solution:

let $\delta_t$ be the amount by which $x_t$ is changed on the step $t$ and

$a$ and $b$ be some constants, $a,b\in\mathbb{R}, a>0, b<0$

$$ \begin{equation} \delta_t=\begin{cases}a\delta_{t-1}, &u_{t-1}\delta_{t-1}<0\\0,&u_{t-1}=0\\b\delta_{t-1}, &u_{t-1}\delta_{t-1}>0\end{cases} \end{equation} $$

$$ \begin{equation}x_t = x_{t-1} + \delta_t\end{equation} $$

There is a trap here. The search converges well not for all $a$ and $b$. Remarkably, it doesn't converge for acceleration $a=2$ and braking $b=-\frac12$.

For example, you could take $a=2$ and $b=-\frac1{3}$ and thing will converge.

You could read about details in the article Guessing a real number by binary response (5 minutes reading)