How much information can you transfer by choosing one number out of two?

431 Views Asked by At

Alice and Bob are playing a game. Alice (A) and Bob (B) agree on a protocol before the game starts.

Alice gets two uniformly random integers (without duplicates) from 1 through $n$, chooses one of the two numbers, and communicates it to Bob. Bob doesn't get to see anything other than the number that Alice chose (and he knows the maximum $n$, which is fixed for the game).

How many bits of information $b(n)$ can possibly be transmitted per number on average, if Alice and Bob play this game for a sufficiently long time?

$b(2) = 1$ is quite obvious, but even $b(3)$ is a difficult to reason about.

One protocol I came up with is that A always output $3$ for inputs $\{1, 3\}, \{2, 3\}$, and output $1$ or $2$ to transfer $1$ bit of information when the input is $\{1, 2\}$. This gets an average transmission rate of $\frac{1}{3}$ bits per number. But I have no idea how to generalize this to bigger $n$ or whether it's even optimal.

5

There are 5 best solutions below

4
On

Alice divides the integers 1 through n into two distinct subsets T and F (for example T = numbers <= n and F = numbers > n, or T = even numbers and F = odd numbers). If n is even then there are n/2 numbers in each subset, if n is odd then she has two sets with n/2 numbers plus an extra number (which can be considered to be the sole element of a third set O).

To transmit TRUE she sends a number from set T if possible, otherwise from O if possible, otherwise from F if both her numbers are in that set. To transmit FALSE she does the opposite.

Bob assumes any T number is TRUE, and F number is false, and any O number carries no information.

For n = 2, 3, 4, 5, 6, 7... the number of transmitted bits is 1, 2/3, 2/3, 3/5, 3/5, 4/7, 4/7... . I think the pattern is obvious. For sufficiently large n, the limit is 1/2 bits per transmission.

11
On

If Alice were not able to see the numbers before sending the max or min, the best possible rate would be:

$$C=\fbox{$1-\frac{1}{n-2}\sum_{k=2}^{n-1}H\left(\frac{n-k}{n-1}\right)$}$$

Alice in the OP's problem has strictly more capabilities than Alice as I have written her, so the rate "OP's Alice" can achieve is higher than the above formula.

For large $n$ the sum becomes like an integral: $$1 - \frac{1}{n-2}\sum_{y=2}^{n-1} H\left(\frac{n-y}{n-1}\right) \approx 1-\int_0^1 H(p) dp = 1-1/\log(4)\approx 0.2787.$$


Your 'choose two numbers' specifies a channel state:

$$Z=(z_1,z_2)\sim \operatorname{Uniform}\{(a,b), 1\leq a< b\leq n\}$$

Each turn, the sender gets to input either $z_1$ or $z_2.$ So the channel input is some $X$ over an alphabet $\{\phi_i:\phi_i(Z)=z_i\},$ and the channel output is $Y=X(Z).$ Now you have a channel:

$$X\rightarrow \overset{\huge\overset{Z}{\downarrow}}{\fbox{$P_{Y|X,Z}$}}\rightarrow Y$$

and by the noisy channel coding theorem it is sufficient to compute the channel capacity: \begin{align} C = \max_{p_X} I(X;Y) &=\max_{P_X} H(X)-H(X|Y) \\ &=\max_{p} H(p)-H(X|Y) \end{align} where $p=\mathbb{P}(X=\phi_1),\ H(p)=-p\log_2(p)-(1-p)\log_2(1-p)$ and $H(X|Y)=\sum_y P_Y(y)H(P_{X|Y=y}(\phi_1))$.

To be able to go further we need to start computing probabilities, which is simple if you set it up right. View your space of $Z$'s layed out as a triangle: \begin{matrix} (1,2) \\ (1,3) & (2,3) \\ \vdots & & \ddots \\ (1,n) & \dots & & (n-1, n) \end{matrix} Now we can see: \begin{align} P_{Y|X=\phi_1}(y)&\textstyle =P(Z\text{ in }y^{th}\text{ column})=\mathbf{1}_{[1:n-1]}(y)\frac{n-y}{\frac{(n-1)\cdot (n-2)}{2}}=\mathbf{1}_{[1:n-1]}(y)\frac{2\cdot (n-y)}{(n-1)\cdot (n-2)},\\ P_{Y|X=\phi_2}(y)&\textstyle =P(Z\text{ in }(y-1)^{th}\text{ row})=\mathbf{1}_{[2:n]}(y)\frac{y-1}{\frac{(n-1)\cdot (n-2)}{2}}=\mathbf{1}_{[2:n]}(y)\frac{2\cdot (y-1)}{(n-1)\cdot (n-2)},\\ P_Y(y)&\textstyle = pP_{Y|X=\phi_1}(y)+(1-p)P_{Y|X=\phi_2}(y) = \left\lbrace \begin{matrix} \frac{2 p\cdot(n-y)}{(n-1)\cdot (n-2)} & y=1 \\ \frac{2 p\cdot(n-y)+2\cdot(1-p)\cdot(y-1)}{(n-1)\cdot (n-2)} & y\in [2:n-1] \\ \frac{2\cdot(1-p)\cdot(y-1)}{(n-1)\cdot (n-2)} & y=n \end{matrix}\right.\\ P_{X|Y=y}(\phi_1)&\textstyle = \left\lbrace \begin{matrix} 1 & y=1 \\ \frac{p\cdot (n-y)}{p\cdot(n-y)+(1-p)\cdot(y-1)} & y\in [2:n-1] \\ 0 & y=n \end{matrix}\right. \end{align}

So in our entropy expansion we have: \begin{align} \max_{P_X}I(X;Y) &= \max_{p\in[0,1]} H(p) - \sum_{y=2}^{n-1}P(Y=y)H(X|Y=y) \\ &= \textstyle \max_{p\in[0,1]} H(p) - \sum_{y=2}^{n-1}\frac{2p\cdot(n-y)+2\cdot(1-p)\cdot(y-1)}{(n-1)\cdot (n-2)} H\left(\frac{p\cdot (n-y)}{p\cdot(n-y)+(1-p)\cdot(y-1)}\right). \end{align}

The critical points of this function in $p$ are: \begin{align} \textstyle \left\lbrace p: - \log_2\left(\frac{p}{1-p}\right) + \sum_{y=2}^{n-1}\frac{2p\cdot(n-y)+2\cdot(1-p)\cdot(y-1)}{(n-1)\cdot (n-2)} \log_2\left(\frac{p\cdot (n-y)}{(1-p)\cdot(y-1)}\right)- \sum_{y=2}^{n-1}\left(\frac{2\cdot (n-2y+1)}{(n-1)\cdot (n-2)}\right)H\left(\frac{p\cdot (n-y)}{p\cdot(n-y)+(1-p)\cdot(y-1)}\right) = 0 \right\rbrace. \end{align}

It is easy to check that $p^\ast=0.5$ is a critical point. (Indeed, in this case the first sum telescopes and cancels the first log, and the second sum becomes symmetric and cancels to 0 since $H(p)=H(1-p)$).

It is also the only critical point: the second derivative can be bounded below 0 (simple computation but tedious to write out, recall $H''(p)=1/(p^2-p)$ and $1-1/x \leq \log(x)\leq x-1$), thus the function is concave with its max at its critical point $p^\ast = 0.5.$

So the maximum possible rate of reliable information transmission boils down to:

\begin{align} C&= 1 - \sum_{y=2}^{n-1}\frac{1}{n-2} H\left(\frac{n-y}{n-1}\right). \end{align}

4
On

This is a partial solution based on the following assumptions:

  1. Alice wants to stream a sequence $\{x_i\}_{i\in\mathbb{N}}$ of bits in $\{0,1\}$. The probability that the first $k$ symbols of this sequence will be a given $w\in\{0,1\}^k$ is uniformly distributed.
  2. "Bit transfer" is defined as the unambiguous transfer of one symbol in the word $w\in\{0,1\}^*$ which Alice wishes to send.
  3. Bob receives bits in order. Thus, the only way to code bits is through prefix codes (i.e.- Alice and Bob agree beforehand that each element of some set $S$ of words, none of which is a prefix of another, corresponds to a unique word $w\in\{0,1\}^*$ and Alice sends her sequence $\{x_i\}_{i\in\mathbb{N}}$ of bits by choosing numbers $c_i\in[1,n]$ so that the corresponding string is a code for a prefix of her message $\{x_i\}_{i\in\mathbb{N}}$).
  4. For simplicity's sake, let's assume the set $S$ of prefix codes is finite.

With the above assumptions, we can define numerically for a given set $S\in\{1,...,n\}^*$ with the prefix property (which says no string is a prefix of another) and function $f:S\to\{0,1\}^*$ the average bit wait time of $(S,f)$, which we denote $\beta(S,f)$, to be equal to:

$$\lim_{k\to\infty}\frac{1}{2^k}\sum_{w\in\{0,1\}^k}{E(w)}$$

where $E(w)$ is the expected value of the number of symbols in $\{1,...,n\}$ that will have to be sent before Bob can be sure that $w$ is a prefix of $\{x_i\}_{i\in\mathbb{N}}$. We can then see that we are looking for $(S,f)$ such that $\beta(S,f)$ is minimized. In other words, we want to find:

$$b(n)=\sup_{(S,f)}{\frac{1}{\beta(S,f)}}$$

If you are confused about the fact that we took the inverse of $\beta$, recall that $\beta$ was defined to give the average time to wait for a bit, which is the inverse of the actual bit transfer wait. Obviously, this is a nontrivial problem. In fact, there may not even be an optimal solution, since there are infinitely many possible $S$, not to mention the even larger possible number of functions $f$. Thus, it could be possible that given a strategy $(S,f)$ we can always find a better one $(S',f')$.

To remedy these problems, one can put a restriction on the type of $(S,f)$ which are allowed. We could let $l_i$ be the maximum length of a word in $S$ and $l_o$ the maximum length of a word in $f(S)$, and then ask the question for the optimal strategies for each $l_i$ and $l_o$. As I said this is a partial answer, I will give the solution for $l_i=1$.

First, note that this implies that it is optimal for us to allow $S=\{1,...,n\}$. Now, given two distinct $u,v\in\{1,...,n\}$ such that $f(u),f(v)≠\epsilon$, then as it is possible to get $(u,v)$ as a pair it must be the case that no matter the word $\{x_i\}_{i\in\mathbb{N}}$ that Alice wants to send one of the strings $f(u)$ or $f(v)$ must map to a prefix of Alice's word. This implies that we must have $\{f(u),f(v)\}=\{0,1\}$.

As a corollary, exactly two letters in $[1,n]$ are "information carriers" in that they correspond to a zero and a one, and all the others correspond to the empty word and thus carry no information. Since on a given bit there is a $\frac{3n-2}{n^2}$ chance of getting the symbol corresponding to the bit we are looking for, $b(n)=\frac{3n-2}{n^2}$. In particular, since there is a $1-\frac{\prod_{k=1}^{m}{n-k}}{n^k}$ chance of scoring a symbol corresponding to a particular bit if we have m different symbols to choose from, the value of the generalized $b$ function with the restriction $l_o=1$ is:

$$b(n,m)=1-\frac{\prod_{k=1}^{m}{n-k}}{n^k}$$

P.S. I am going to try to generalize the method I used above to show this without the $l_o=1$ restriction, since it doesn't exactly seem necessary to me for this proof and so if I succeed I'll add an edit to this post showing the real value of $b(n,m)$.

8
On

For large $n$ we can approach an efficiency of $0.188$ bits per symbol, in the following way:

In the protocol Alice can send a bit to Bob either "reliable-but-slow" or "fast-but-unreliable".

In reliable but slow mode, Bob ignores all symbols other than $0$ or $1$, and Alice simply waits until she gets a pair that allows her to send the symbol she needs. This takes $n/2$ symbols on average.

In fast but unreliable mode, Alice sends any even number for $0$ and any odd number for $1$. One fourth of the bits she sends will be garbled because she doesn't have a number of the right parity to send.

Alice and Bob agrees in advance on a block size $b$, and the communication starts when Alice has accumulated $b$ bits to send. She sends them using fast-but-unreliable mode. Alice knows which bits went wrong, so she computes a correction sequence that Bob should XOR what he got with in order to get the real message. The bits in the correction sequence are not uniformly distributed: $0$ is three times as common as $1$. Therefore Alice can compress it using arithmetic coding to a length of about $0.811$ of the original $b$ (namely $-\frac34 \log_2(\frac34) - \frac14 \log_2(\frac14)$).

Alice uses reliable-but-slow bits to tell Bob the exact length of the compressed correction sequence, and now sends it using the same procedure as the original block -- that is, with its own correction sequence following, etc.

When the compressed corrections sequence is shorter than some prearranged length, Alice switches to sending it in reliabe-but-slow mode.

Analysis. When $b$ is large, the expected number of fast-but-unreliable symbols sent is $$ b + 0.811b + 0.811^2b + 0.811^3b + \cdots = \frac{1}{1-0.811} b $$ There will be on the order of $\log b$ nested correction sequences, each needing up to $\log b$ reliable-but-slow bits to encode their lengths. So all in all in the reliable-but-slow phases use about $\frac n2 (\log^2 b+C)$ symbols, where $C$ accounts for the final reliable-but-slow corrections sequence).

For each particular $n$, by choosing $b$ large enough, Alice and Bob can make the $\frac n2 (\log^2 b+C)$ as small compared to $\frac{1}{1-0.811} b$ as they want! So in the limit of large $b$ they ought to achieve $1-0.811 = 0.189$ bits per symbol.


If $n\le 10$, the reliable mode is actually faster than the asymptotic behavior here, and should be used instead.


I suspect this might be asymptotically optimal for large $n$. I tried variants, such as decreasing the error percentage of the "fast" mode at the expense of speed by assigning a fraction of the $n$ symbols as "ignore this symbol" markers instead of letting all of them encode actual bits. But it looks like that always results in worse overall performance.

0
On

Interesting problem.

A quick bound comes from considering, for $n$ even, that half of the values represent a binary $X=0$ input, the other half $X=1$. This gives a BSC channel with crossover error probability given by the event that the two numbers I must choose from are on the "wrong" half

$$ p_e = \frac{\binom{n/2}{2}}{\binom{n}{2}}=\frac14 \frac{n-\frac12}{n-1}\approx \frac14$$

This gives a capacity of $C=1-h(p)=0.18872 \text{ bits}$, which agrees with Henning Makholm's answer.

Some caveats

  • We are wasting bits (when both numbers are "good", the freedom of choosing one of the two is not used). Hence the capacity should be larger.

  • This result (and this encoding schema) is in the context of the classsic Shannon capacity, where information is sent with vanishingly small probability of error - but not necessarily with zero probability of error.


A better, probably optimal (and even simpler) schema:

Let our "channel" simply pick the maximum of the pair when $X_n=1$, the minimum otherwise. This is channel with two inputs and $n$ outputs. The transition matrix is given by the probability of the extrema:

$$P(Y = y \mid X = 1) = \frac{2 (y-1)}{n (n-1)} =g(y;n)$$

while for $P(Y = y \mid X = 0)$ we have the symmetric distribution.

Then the capacity of this (weakly symmetric) channel is

$$C=\log(n) - H(g(y;n))$$

where $H(g(y;n))=h_T(n-1)$ is the entropy of a triangular distribution with $n-1$ positive values. Calling $s=\frac{2}{n(n-1)}$

$$h_T(n-1)= -s \left( \log s \sum_{k=1}^{n-1} k + \sum_{k=1}^{n-1} k \log k\right)\\ =- \log s + s \log({\mathcal H}(n-1))$$

where ${\mathcal H}$ is the hyperfactorial.

The capacity decreases asymptotically (and very slowly) to $C_{\infty}=1-\frac{\log e}{2}\approx 0.2786$ bits (this agrees with enthdegree's value).

enter image description here

BTW: The $n=3$ case is, in this scheme, esentially a binary erasure channel, with $p_e=1/3$ and $C=2/3$ bits. A zero-error code with this rate (but unbounded length) can be built (rather trivially) if (and only if) we have feedback. But in this case the "feedback" is inside this virtual channel, so it would be feasible (as noted by Henning Makholm).