I have a channel where the probability of sending a $0$ and receiving a $1$ is $P_e$, sending $0$ and receiving $0$ is $1-P_e$, sending $1$ receiving $0$ is $P_e$, sending $1$ receiving $1$ is $1-P_e$. Assume $P_e=0.25$.
Thing is: I have $3000$ bits to send (already compressed) and, after the COD, these bits can be coded with up to $16000$ bits. This doesn't mean that I have to take the $3000$ bits at once, I can split them and just code them by "parts" (for example, $3$bits$\to 16$bits). The goal is, of course, to minimize the probability of error.
Which kind of linear code (COD+DECOD) would you use? I just want a hint of which one and why, no a complete simulation demonstrating that the code is nearly optimal.
There is a Hadamard code which uses 32 bits to transmit 6, which is the ratio you want. It is the dual of an extended Hamming code. It has a weight of 16, i.e. it can certainly decode a block correctly if it has 7 errors or fewer, and it can certainly detect an ambiguity if the block has 8 errors. It is not a perfect code, meaning that it can sometimes correctly decode a message that has more than 7 errors. This code is not really good enough for a bit-error rate of 0.25, but the code is fairly simple, so let's look at it anyway.
Let
count_bitsbe a function that counts the number of ones in the binary representation of its integer argument. In Python:def count_bits(n): ans = 0 while n != 0: ans += 1 n &= n-1 return ansWe can then define the code as follows:
valid_messages = [ [ count_bits(i&n) & 1 for i in range(32,64) ] for n in range(64) ]It is a linear code. Its six basis vectors are as follows:
We can easily verify the distances between the codewords of the code:
def distance(message1, message2): return sum([x^y for x, y in zip(message1, message2)]) for m in range(64): for n in range(m): d = distance(valid_messages[m], valid_messages[n]) print "Distance from %d to %d is %d" % (m, n, d) assert d >= 16The code is small enough that we can decode it by dumb exhaustive search:
def decode(message): best = [] best_count = 1000000 for n, candidate in enumerate(valid_messages): count = sum([x!=y for x, y in zip(message, candidate)]) if count < best_count: best = [] best_count = count if count == best_count: best.append(n) return bestNote that this function returns a list of possible answers; the list can contain more than one answer if the decoding is ambiguous.
We can now measure the performance of the code experimentally. It does not matter what message you are trying to transmit, so let's transmit the zero message. We will corrupt it at random, and then try to decode it. We will count the number of successes, the number of detected ambiguities, and the weight of the worst error that was successfully corrected.
from random import random p_error = 0.25 NUM_TRIALS = 100000 num_successes = 0 num_ambiguous = 0 best_weight = 0 best_example = valid_messages[0] for _ in range(NUM_TRIALS): message = [int(random()<p_error) for _ in range(32)] decodings = decode(message) if len(decodings) > 1: num_ambiguous += 1 elif decodings == [0]: num_successes += 1 weight = sum(message) if weight > best_weight: best_weight = weight best_example = messageThe results are as follows:
For comparison, another code that meets your bound on the rate is the repetition code of order 5. That code correctly decodes a bit with probability 0.897 (which is better) and incorrectly decodes it with probability 0.103 (which is worse), but it reports no ambiguities (which is likely to be undesirable).
The theoretical rate limit for a bit error rate of 0.25 is 0.1887... which is very close to your specification of 0.1875. It will be difficult to construct a code that meets your specification. It will certainly require a block length much longer than 32 bits; even if you use all 16000 bits the probability of an incorrect decoding will be greater than $10^{-6}$. There are $2^{3000}$ possible messages so it will probably be rather difficult to decode. The suggestions in the comments look sound to me.
Update: I have done a more accurate calculation for codes with a block length of 16000.
We want at least $2^{3000}$ distinct codewords. There are only $2^{16000}$ possible received messages, so at most $2^{13000}$ distinct received messages can decode to each codeword.
A calculation indicates that there are more than $2^{13000}$ binary strings of length 16000 that contain 4017 ones or more. Therefore, we cannot hope to make a code that can correct more than 4016 errors.
Another calculation indicates that the probability of 4016 errors or fewer is about 0.61891365. Therefore, the probability of an incorrect or ambiguous decoding is at least about 0.38108635. This applies to any error-correcting code and there is no way around it.
This illustrates how close to impossibility your specification is. In practice, we will not be able to approach this theoretical bound.