Consider a code $C$ that uniquely maps each k-digit binary number $P=p_1 p_2 \cdots p_k$ to some k-digit binary number $Q=q_1 q_2 \cdots q_k$. I.e., the code is $C(P) = Q$ and $C^{-1}(Q) = P$.
There are $2^k!$ such codes.
Now consider that code words are transmitted over a noisy channel which will flip exactly 1 bit in each code word Q, yielding another code word Q'. This will be decoded to $C^{-1}(Q')= P' \neq P$.
I am interested in the code that minimizes $|P' - P|$ for all $P$. How would I go about constructing it? I am grateful for any hints.
(My intuition is that the identity that maps each $P$ on itself, i.e., $\forall P: C(P)=P$ is optimal, but I don't know how to prove it. I hope my question is clear -- I'm not a mathematician. I'm happy to elaborate. To clarify in advance: I don't want to correct the error or detect which bit flipped, just minimize the error.)
If you are measuring |P-P'| as an integer, then something like a modified gray code may be better than the identity function.
For example:
David's tweaked grey-like encoding:
identity coding:
As you can see, the worst-case |P-P'| for the above tweaked grey-like encoding is 7, which is better than the worst-case |P-P'| of 8 for the identity mapping.
Therefore, C(P)=P is not optimal for k=4 (and, I suspect, for all higher k).
If you are more interested in the average or the median error |P-P'| over every code and over every possible single-bit error -- or, as Jyrki Lahtonen hints, the "circular error" (|P-P'| mod 2^k) -- then perhaps a standard "reflected" gray code may be better.
I'm not sure why you don't want to make Q a few bits longer than P -- then you can apply Hamming codes and correct any single-bit bit error, giving zero error.