Is there an error-correcting code where almost every word could be used as a codeword?

171 Views Asked by At

An error-correcting code for strings of length $n$ from a $K$ letter alphabet is a partition $\Pi$ of $K^n$ together with a choice function $\pi$ on $\Pi$. Let $A_i$ for $i<M$ enumerate $\Pi$, and imagine that there are $M$ different possible messages you want to be able to transmit across a noisy channel (one that can change symbols in the string). You use the error-correcting code by encoding message $i$ as $\pi(A_i)$ for transmission and decoding a received message as the index of the element of $\Pi$ it belongs to. Clearly a good code will have the properties that $\Pi$ is large, and for any $A\in \Pi$, $\pi(A)$ differs at many locations from any string not in $A$. But what can we show about how rare it can be for a randomly picked string to differ at few locations from a string in a different partition element?

This question comes from one I asked on Crypto Stack Exchange: https://crypto.stackexchange.com/questions/19340/is-deniable-error-correction-possible

1

There are 1 best solutions below

5
On

I'm still not sure that I understood the question. Therefore I post this toy example calculation as an answer. Mostly to get the ball rolling.

Assume that $K=2$, $n=23$ and we use the perfect 3-error correcting binary Golay code $G$. Let the partition $\Pi$ consist of the bounded distance decision regions of $G$. In other words, if $\vec{x}_i$ is the $i$th word of $G$, $i=1,2,\ldots,4096$, then $$ A_i=\{\vec{y}\in\{0,1\}^{23}\mid d(\vec{y},\vec{x}_i)\le3\}. $$ Because the Golay code is perfect, these regions form a seamless partition of the entire space. We see that $A_i$ contains the vector $\vec{x}_i$, the $23$ vectors at distance $1$ from it, the ${23\choose2}=253$ vectors at distance $2$ from it, and finally ${23\choose3}=1771$ vectors at Hamming distance three from $\vec{x}_i$. Altogether $1+23+253+1771=2048=2^{11}$ vectors (in accordance to perfectness the $2^{12}$ such partition regions contain all the $2^{23}$ possible messages.

If I understood the question correctly, we are interested in the probability of a randomly chosen vector from the set $A_i$ being close to a vector from another partition $A_j,j\neq i$. Let us consider the case, where we specify close to mean at Hamming distance one. Here's what happens in our case. If $\vec{y}\in A_i$ is at Hamming distance three from $\vec{x}_i$, then it is at Hamming distance one from $23-3=20$ vectors $\vec{z}$ with the property $d(\vec{z},\vec{x}_i)=4$. This is easy to understand: we got to $\vec{y}$ from $\vec{x}_i$ by toggling three out of 23 bits, so by toggling any of the 20 remaining bits we keep moving away from $\vec{x}_i$. So none of those 20 vectors $\vec{z}$ belong to $A_i$, so they belong to some other partition.

What this means is that if we select a random vector from the set $A_i$, then it will be within Hamming distance one from a vector of some other partition $A_j, j\neq i$ with probability $1771/2048\approx 0.86475$.

I don't know if this probability is good news or bad news for you. I want to make two more remarks.

  • I used a perfect code here to make the description of the partition as well as the calculation easy. I need to warn you that in general things become very messy, because there are darn few perfect codes (and all but the Golay code and repetition codes of length $n$ and rate $1/n$ can only correct a single error). This means that the partitions won't have this "spherical" shape. There will be regions that are somewhat distant from all the centers of the regions. I guess this would tend to increase the above probability, but my intuition is somewhat underdeveloped to make such a claim with any kind of confidence.
  • Anyway, what we witnessed above is an example of the following maxim:

    In a high dimensional space the bulk of the volume of an approximately spherical region is close to its boundary.

If we spend too much time staring at 2-dimensional pictures, such as tiling the plane with squares, triangles or hexagons, we may develop the intuitive feeling that most of the area is relatively far away from the boundaries of those regions. This is a low-dimensional fallacy.