Say we have a discrete signal $X\in \mathcal{X}$, and a noisy observation $Y\in\mathcal{Y}$. We wish to encode $Y$ into some encoding $U$ with rate $R$. That is, $H(U)=R>0$. And we want $U$ to have as much information about the signal $X$ as possible. What procedure can be taken to generate the encoding $U$ so that $I(U; X)$ is maximized?
We want to generate a random variable $U$ with the following properties:
- The amount of bits needed to describe $U$ is no greater than $R$. That is, $H(U)\leq R.$
- Only $Y$ is needed to generate $U$. That is, $X\rightarrow Y \rightarrow U.$
- $I(U;X)$ is at a maximum given the last two constraints.
This is equivalent to finding this function:
$$F(R) = \displaystyle \max_{\substack{P(U): \\ H(U)\leq R, \\ X\rightarrow Y \rightarrow U}} I(X;U).$$
If necessary we can use block codes so that instead we generate a vector $U^n$ with rate $nR$ from a vector of $n$ successive i.i.d. observations, $Y^n.$
What is a general procedure for generating such a $U$?
CLEARLY:
- If $R=0$ then $F(R)=0$, because we can't put any information into 0 bits.
- If $R\geq H(Y),$ then $F(R)=I(X;Y),$ because at that point we can just set $U=Y$. By the data processing inequality on the markov chain, $F$ can't get any bigger than this.
- In between those two, we could just use time-sharing, so if $R\in [0,H(Y)]$ then $F(R) \geq \frac{I(X;Y)}{H(Y)} R .$
- $F(R) \leq R,$ since we can't convey more than $R$ bits of data through $U$ if $H(U)=R$.