expected error when flipping bits of binary string

958 Views Asked by At

Assume we have a binary string with $n$ bits. Now we flip $k$ (different) bits and get a new binary string. For any Gray code the two binary strings represent integers $x$ and $x'$.

What is the expected error $|x-x'|$?

Any idea how to calculate this? I'am interested in an analytical answer depending on $n$ and $k$. I'am not interested in approximation or a correct answer for small values by write force.

Update: So I thought a bit more about this problem and I think the following is true. Let $M = \{0,\dots , 2^n-1\}$ and $\pi\colon M \rightarrow M$ a bijection. Then $$\sum_{x,x'\in M, d(x,x') = k} |\pi(x) - \pi(x')| = \sum_{x,x'\in M, d(x,x') = k} |x - x'|.$$ where $d(x,x')$ denotes the hamming distance. I haven't proved this, but letting $\pi$ be the above mentioned Gray code, this seems to be numerically correct. Computing the expected error when flipping $k$ bits is not so difficult anymore. I think it should be $$\frac{k}{n\cdot{n-1 \choose k-1}} \sum_{i=0}^{n-1}2^i {i\choose k-1}.$$

2

There are 2 best solutions below

7
On BEST ANSWER

Your ideas proposed in the update are correct. First, we can show

$$\displaystyle\mathbb{E}_{x, x' \in M, d(x, x') = k} |{\pi(x) - \pi(x')}| = \mathbb{E}_{x, x' \in M, d(x, x') = k} |x - x'|$$

where $M = \{0, \dots, 2^n - 1\}$ and $\pi$ is the inverse Gray code i.e. decoding function (actually, both work, but I think your claim flips codeword bits, not plaintext bits). I did it by induction, but it would be nice if there was a slick counting argument:

$$\displaystyle\mathbb{E}_{x, x' \in M, d(x, x') = k} |{\pi(x) - \pi(x')}| = p_{00}\mathbb{E}_{x, x' \in M, d(x, x') = k, x_1 = x'_1 = 0} |\pi(x) - \pi(x')| + p_{01}\mathbb{E}_{x, x' \in M, d(x, x') = k, x_1 \neq x'_1} |\pi(x) - \pi(x')| + p_{11}\mathbb{E}_{x, x' \in M, d(x, x') = k, x_1 = x'_1 = 1} |\pi(x) - \pi(x')|$$

where $p_{00}, p_{01}, p_{11}$ are some constants and $x_1, x'_1$ denote the first bits of $x, x'$. A critical property of the Gray code is that $\pi$ restricts to the Gray code on $\{0, \dots, 2^{n-1} - 1\}$. Thus the induction hypothesis covers the $p_{00}$ term (and similarly the $p_{11}$ term).

For the $p_{01}$ case, WLOG we've picked $x$ from $\{0, \dots, 2^{n-1} - 1\}$ and $x'$ from $\{2^{n-1}, \dots, 2^{n} - 1\}$. This guarantees that $\pi(x')$ is greater than $\pi(x)$, and so conditioning on this case

$$\mathbb{E}|\pi(x') - \pi(x)| = \mathbb{E}[\pi(x')] - \mathbb{E}[\pi(x)] = 2^{n-1} = \mathbb{E}|x' - x|$$

This completes the induction. Now we can straightforwardly compute the expectation

$$\mathbb{E}_{x, x' \in M, d(x, x') = k} |x - x'| = \sum_{i=0}^{n-1}2^i \frac{i \choose k-1}{n \choose k} $$

The probabilistic experiment that samples $|x- x'|$ is equivalent to the following: select $k$ (distinct) indices out of $\{0, 1, \dots, n-1\}$, and call them $i_1 < i_2 < \cdots < i_k$. These are the bits on which $x$ and $x'$ differ. Pick a random sign $s_j \in \{\pm 1\}$ for each, and then output $|s_12^{i_1} + s_22^{i_2} + \cdots + s_k 2^{i_k}|$.

The key observation here is that the sign of $s_12^{i_1} + s_22^{i_2} + \cdots + s_k 2^{i_k}$ is determined by $2^{i_k}$, because the other terms cannot overpower it. So
$$|s_12^{i_1} + s_22^{i_2} + \cdots + s_k 2^{i_k}| = 2^{i_k} + s_12^{i_1} + s_22^{i_2} + \cdots + s_{k-1} 2^{i_{k-1}}$$ Now when we take the expectation, we can use linearity. All terms except the first disappear, and we're left with $$\mathbb{E}_{x, x' \in M, d(x, x') = k} |x - x'| = \mathbb{E}[2^{i_k}]$$ Finally, the probability that $i$ is the maximum of $k$ choices from $\{0, \dots, n-1\}$ is $\binom{i}{k-1} \left/\binom{n}{k}\right.$, as we must pick $k-1$ smaller elements from $\{0, 1, \dots, i-1\}$, a set of size $i$. This shows $$\mathbb{E}_{x, x' \in M, d(x, x') = k} |x - x'| = \sum_{i=0}^{n-1}2^i \frac{i \choose k-1}{n \choose k} $$

1
On

My first thought when I was presented this question was one of well-definedness. You only give me n and k, but shouldn't the result depend on $x$, the binary string, as well? Nope - it doesn't. I haven't tried to prove this, but it seems true given the case $n=3, k=1$ $x \in \{000,001,010,011,100,101,110,111\}$. If we examine $x=000$, $x=010$ and $x=101$.

Case 1: $x = 000$

Possible new strings, $x' \in \{001,010,100\}$

$<|x-x'|>= \frac{|001-000|+|010-000|+|100-000|}{3}$ = $\frac{100+010+001}{3} = 1*\Sigma_{i=0}^{n-1}2^i/\binom{n}{k} = \frac{7}{3}$

Case 2: $x = 010$

Possible new strings, $x' \in \{011,000,110\}$

$<|x-x'|>= \frac{|011-010|+|000-010|+|110-010|}{3}$ = $\frac{100+010+001}{3} = 1*\Sigma_{i=0}^{n-1}2^i/\binom{n}{k} = \frac{7}{3}$

Case 3: $x = 101$

$x' \in \{001,111,100\}$

$<|x-x'|>= \frac{|001-101|+|111-101|+|100-101|}{3}$ = $\frac{100+010+001}{3} = 1*\Sigma_{i=0}^{n-1}2^i/\binom{n}{k} = \frac{7}{3}$

This result makes sense as changing the $j^{th}$ digit of a binary number changes its value the same amount no matter what that number is. Also, now that we know the problem is well-defined, we can pick any $x$ (as long as it has $n$ binary digits) and the result will be the same as any other $x$. An easy $x$ to pick is all zeroes. This way, we reduce the problem to the expected value of an n-digit binary number with k ones (since $|x-x'|=x'$ when $x=0$). And now we can easily apply elementary counting to solve it.

If there are $k$ ones in the string (with all other digits zero), in all of the strings of size $n$, each digit will be flipped a total of $k$ times. Then the error caused by flipping the $i^{th}$ digit is $2^i$ (where the one's place is indexed as $0$). So the sum of error for all the strings will be $\Sigma_{i=0}^{n-1}k2^i = k\Sigma_{i=0}^{n-1}2^i$. And the total number of strings is just n choose k since you are choosing k of the n digits to flip. This leads me to the final result of $$k\Sigma_{i=0}^{n-1}2^i/\binom{n}{k} = \frac{k(2^{n}-1)}{\binom{n}{k}}$$ Note: this formula is why I included a $1*$ in some of the above formulas for the specific case $n=3$, $k=1$ - the $1$ was supposed to make you think of $k$.

Can anyone who wrote a script to calculate this confirm that this analytic result matches some of their numerical results?