Greater/lesser search with one false answer allowed

188 Views Asked by At

It is well known that you can determine the values of $n\geq 2$ bits using $k$ yes/no questions about the bits (for example, "is $x_1 \oplus x_3 = 1$?), even if one (but not more) of the answers obtained may be a lie, if and only if $$ 2^n k \leq 2^k $$ This is just the observation that you can always construct a Hamming code using $k-n$ extra bits, which corrects single-bit errors, if $k$ is of that size.

(This does not work if $n=1$ since you require $3$ bits rather than $2$ to achieve single error correction.)

For example, you can determine a 4-bit number using seven questions, even allowing for one possible lie, by asking for the values of each bit, and then asking for the values of $x_0 \oplus x_1 \oplus x_2$, then $x_0 \oplus x_1 \oplus x_3$, then $x_0 \oplus x_2 \oplus x_3$. Note that these questions can be fixed beforehand; later questions need not depend on answers to the earlier ones.

But what if the nature of the yes/no questions allowed is restricted?

In particular, can you determine (with certainty) a specific number $x \in [0,15]$ using no more that seven questions of the form "Is $x > q$", coping with up to one lie? It is allowable to tailor later questions based on the answers to earlier questions.

As a generalization, what are the conditions on $N$ and $k$ that suffice to ensure that you can determine a specific $x \in [0,N)$ using up to $k$ questions of the form "Is $x > q$", coping with up to one lie? My real question to be posed is:

What is the largest value of $N$ such that a specific number $x \in [0,N)$ can with certainty be determined in $7$ "Is $x > q$" questions, coping with at most one lie?

Stanislaus Ulam did some work on this in the 1940's and that led to the work that led to Hamming codes, but I don't remember whether he actually presented a condition for general $N$.

For some examples, $$k(N=2) = 3 \\ k(N=3) = 4 \\ k(N=4) = 5 \\ k(N=5) = 6 \\ k(N=6) = 6 \\ k(N=7) = 7 \\ k(N=8) = 7 \\ $$

As for the case of $N = 15$, I strongly suspect that the "information theory value" of seven questions does not suffice when the ordering restriction is imposed. But I'd be interested to see a proof.

3

There are 3 best solutions below

2
On BEST ANSWER

I have proven an optimal strategy and written a short program to compute for small $N$; see my answer to my generalized question.

This completely answers your question rigorously. Specifically, $7$ queries is enough to distinguish up to at most $12$ possibilities, while $8$ queries is enough to distinguish up to at most $22$ possibilities.

3
On

There has been some discussion of this in a newer question, where you will find a detailed exposition and a certain amount of disagreement; but I will summarise the information-theoretic aspects here.

The key is to note the correct information content of the problem. Let us suppose that $Q$ questions are to be asked. I will cause irritation and confusion by supposing that the values to be guessed are in the half-open interval $[0,N)$ rather than the closed one you specified. This means that there are exactly $N$ possibilities to be guessed, rather than $N+1$, and it makes the algebra easier to read. The other change in notation is that I will refer to the questions as being of the form "$x<k$?".

  • The answerer has to have two pieces of information to do his job. One is the number $n$ to be guessed: $N$ possibilities. The other is the number $q$ of the question which he is to answer with a lie. $q$ does lie in the closed interval $[0,Q]$, because we can take $q=Q$ to mean "the answerer never lies". Thus the total number of possibilities between which the answerer has to choose is $(Q+1)N$.

  • The questioner can't reach his conclusion without deducing all the information that the answerer needed to produce his answers: thus, $(Q+1)N$ possibilities. Another way of looking at it is that when the questioner has guessed the number $n$, he also knows the number $q$.

Since each question can distinguish between two possibilities only, $Q$ questions are needed to distinguish between $2^Q$ possibilities. Thus, comparing the possibilities distinguishable by $Q$ questions with the possibilities that need to be distinguished, we have $$2^Q\ge(Q+1)N$$

I shall present an algorithm which, I claim, is within one or two units of achieving a successful guess with the smallest integer $Q$ that satisfies the above equation.

The strategy

The questioner's strategy is to play $Q+1$ games simultaneously: one in which the answerer lies at question $0$, one in which he lies at question $1$, and so on up to question $Q-1$; plus one in which the answerer doesn't lie at all (which it is not illogical to describe at "lies at question $Q$", since question $Q$ is never asked).

At each stage, the questioner asks one and only one question, "is $x<k$?", and applies the answer to all $Q+1$ simultaneous games. Thus the answerer only hears $Q$ questions, as required.

At the start, each game has all $N$ numbers as possibilities. Each answer to a question reduces that number. It should be noted that there is no question of lying in these sub-games: each one receives, at each stage, a true answer. All the questioner has to remember is that the answer to question $q$ should be inverted ("Yes" becomes "No" and "No" becomes "Yes") for game $q$, to allow for the fact that game $q$ is the one in which answer $q$ is a lie.

As time goes on, the possible numbers resulting from each of the $Q+1$ games become fewer and fewer. For some games they all disappear, and eventually they disappear for all games except one, and in that game only one number is left. At that point the number is known, and the identity of the game which still contains that number gives the number of the question that was answered falsely (with, as before, game number $Q$ denoting "there were no false answers").

The strategy at step $q$ is therefore:

  • Define $N_q$ as the number of possible numbers, totalled across all games (thus, the number of possible $\langle n,q\rangle$ pairs). Clearly $N_0=(Q+1)N$.
  • Ask a question "$x<k$?" such that a Yes answer eliminates precisely half of the possible numbers, totalled across all games. Then a No answer will also eliminate half the possible numbers. We can say that $N_<=N_\ge=\frac12N_q$.
  • Given the answer, eliminate all numbers in game $q$ which do correspond to the answer, because in this round the answer to game $q$ is a lie so that "$x<k$" means that $x$ is greater than or equal to $k$. In all other games, eliminate all numbers which do not correspond to the answer, because in this round the answer to all games other than $q$ is true.
  • You now have either $N_{q+1}=N_<$ or $N_{q+1}=N_\ge$: that is to say, $N_{q+1}=\frac12 N_q<$ if $k$ was chosen perfectly.

The possibility of choosing $k$

At any stage, games can be of three kinds:

  1. Games in which the lie has already been told.
  2. Games in which the lie is being told in the current round.
  3. Games in which the lie will be told in the future (or will never be told).

The ranges of numbers covered by class (1) are all disjoint (since each game comes from a different sequence of Yes/No after allowing for lies). Classes (2) and (3) all contain identical ranges to each other, but that range is disjoint from any of the class (1) ones.

The effect of this on the value of $N_<$ for various potential values of $k$ is as follows:

  1. Wherever $k$ is in class (1), $dN_</dk=+1$. That is, each increase of 1 in "$x<k?$" yields an increase of 1 in the number of numbers that remain possible if the answer is Yes.
  2. Wherever $k$ is in class (2) or (3), $dN_</dk=Q-q-2$. That is, each increase of 1 in "$x<k?$" yields an increase of $Q-q-2$ in the number of numbers that remain possible if the answer is Yes. This is because there are $Q-q-1$ games in class (3) and they all share the same range of possibilities and are all affected by answers in the same way, and there is $1$ game in class (2) which shares the same range, but for which the number of possible numbers decreases by 1 unit for each 1-unit increase in $k$. (The reverse direction is because a "Yes" to "$x<k?$" means, for class (2), "No").
  3. If $k$ is not in any of these classes, $dN_</dk=0$.

Leaving aside occasional flat spots, the graph of $N_<$ versus $k$ is therefore monotone. Where the slope is 1, it is easy to find a $k$ such that $N_<=\frac12N_q$. This would yield an explicit algorithm which halves the number of possibilities at each stage and therefore reaches a conclusion in the minimum theoretical time.

The only time where this breaks down is when the slope is not 1 at the point where the horizontal line $N_<=\frac12 N_q$ intersects the graph. In that region $N_<$ jumps by $Q-q-2$ units for each one-unit step in $k$. This means that the attainable value of $N_<$ closest to $\frac12 N_q$ could be $\frac12(Q-q-2)$ units away from it. This, in turn, means that rather than having the ideal $$N_{q+1}=\frac12 N_q$$ we may have $$N_{q+1}=\frac12 N_q+\frac12(Q-q-2)$$

This slightly slows the progress of the game, and it makes explicit analysis difficult; but since the largest value of $(Q-q-2)$ will be of the order of $\log_2{N}$, it will be seen that it does not have an overwhelming effect on progress.

The algorithmic complexity of finding the right value of $k$ is essentially logarithmic. It is a question of sorting the available ranges into order; a binary search to find the range which brackets the desired value of $N_<$; and simple arithmetic to find the desired value in the range.

2
On

This answer specifically addresses the question "Are 7 questions enough for $N=15$?". I am assuming that this relates to the 15 possible numbers from 0 to 14 inclusive.

Let us suppose that 7 questions are sufficient. Then, following the method described in my first answer:

First round

We start with 8 sub-games, one for each possible "false" answer (plus one for the absence of a "false" answer). Each has 15 possibilities, making 120 in all.

So we ask "Is $x<7$?". A "Yes" answer reduces the 1st sub-game to 7 possibilities (8 to 14) and all the others to 8 each (0 to 7). Thus there are 63 possibilities for Yes and 57 for No. ("Is $x<8$?" would have made the split 57:63).

Thus we end up with at worst 63 possibilities.

Second round

The 1st sub-game now covers a range of possibilities distinct from the remaining ones. The remaining sub-games all have the same range as each other.

Case 1: 1st game has 0-6, other games have 7-14.

In this case we start with 63 possibilities.

An answer "Yes" to "Is $x<0$?" will leave just 6 possibilities, all of them in the 2nd game, in which this answer is a lie.

"Is $x<1$? Yes!" will add one possibility from the 1st game (where the answer is truthful), making 7 in all. "Is $x<2$? Yes!" will add one more, and so on, until when we have "Is $x<7$? Yes!" we have a total of 13 possibilities, 7 of them from the 1st game and 6 from the 2nd.

Moving up to "Is $x<8? Yes!" will remove $x=7$ as a possibility from the 2nd sub-game but add it to the 3rd to 8th, a total of 6 sub-games. Thus the total number of possibilities increases by 5, from 13 to 18.

Each successive increase will add 5 more to the number of possibilities. We want to get as close as possible to $63/2=31\frac12$. The closest point is 33, if we ask "Is $x<11$?". This splits the remaining possibilities 33:30.

Case 2: 1st game has 7-14, other games have 0-6.

In this case we start with 57 possibilities.

An answer "Yes" to "Is $x<0$?" will leave just 7 possibilities, all of them in the 2nd game, in which this answer is a lie.

An answer "Yes" to "Is $x<1$?" will remove 1 possibility ($x=0$) from the 2nd game but add $x=0$ to the 3rd to 8th games, thus adding 6 possibilities to make an overall net increase of 5: a total of 12 possibilities.

By the same logic as before, if we ask "Is $x<4$?", we will split the original 57 possibilities 27:30, and this is the most balanced split available.

Third round

The worst possible outcome coming into the 3rd round is a "Yes" answer from case 1 of the 2nd round, which yields 33 possibilities. Since we have only five Yes/No questions to go, and they can only cover $2^5=32$ possibilities between them, we will not be able to distinguish between every remaining possibility.

So 7 questions are not enough.

Repeating the analysis, but assuming 8 questions and therefore using 9 sub-games, gives a larger margin of error: 135 possibilities to be distinguished by 8 questions. The worst-case number of possibilities by following the most even split is:

  • After Round 1: 71.
  • After Round 2: 38.
  • After Round 3: 21.
  • After Round 4: 12.
  • After Round 5: 7.
  • After Round 6: 4.
  • After Round 7: 2.
  • After Round 8: 1.

So 8 questions appear to suffice. The problem is that we have not proven that the most even split is the worst case, although it is likely to be so.