Best way to compress a noisy observation

83 Views Asked by At

Say we have a discrete signal $X\in \mathcal{X}$, and a noisy observation $Y\in\mathcal{Y}$. We wish to encode $Y$ into some encoding $U$ with rate $R$. That is, $H(U)=R>0$. And we want $U$ to have as much information about the signal $X$ as possible. What procedure can be taken to generate the encoding $U$ so that $I(U; X)$ is maximized?

We want to generate a random variable $U$ with the following properties:

  • The amount of bits needed to describe $U$ is no greater than $R$. That is, $H(U)\leq R.$
  • Only $Y$ is needed to generate $U$. That is, $X\rightarrow Y \rightarrow U.$
  • $I(U;X)$ is at a maximum given the last two constraints.

This is equivalent to finding this function:

$$F(R) = \displaystyle \max_{\substack{P(U): \\ H(U)\leq R, \\ X\rightarrow Y \rightarrow U}} I(X;U).$$

If necessary we can use block codes so that instead we generate a vector $U^n$ with rate $nR$ from a vector of $n$ successive i.i.d. observations, $Y^n.$

What is a general procedure for generating such a $U$?


CLEARLY:

  • If $R=0$ then $F(R)=0$, because we can't put any information into 0 bits.
  • If $R\geq H(Y),$ then $F(R)=I(X;Y),$ because at that point we can just set $U=Y$. By the data processing inequality on the markov chain, $F$ can't get any bigger than this.
  • In between those two, we could just use time-sharing, so if $R\in [0,H(Y)]$ then $F(R) \geq \frac{I(X;Y)}{H(Y)} R .$
  • $F(R) \leq R,$ since we can't convey more than $R$ bits of data through $U$ if $H(U)=R$.