Entropy of noisy continuous channel

37 Views Asked by At

I'm recently learning Shannon entropy. The discrete case seems to be easy to understand and I'm trying to apply it to the continuous case.

Suppose a channel has discrete inputs $X$ and outputs $Y$ per unit time, with a joint probability function $p(x,y): X\times Y \to [0,1]$, where $p(x,y)$ is the probability that $x\in X$ is sent and $y\in Y$ is received. The formula for the channel's bandwidth is $H(x)+H(y)-H(x,y)$, where $H(x)=\sum_{x\in X} p(x,*)\log p(x,*)$, $H(y)=\sum_{y\in Y} p(*,y)\log p(*,y)$ and $H(x,y)=\sum_{x\in X}\sum_{y\in Y} p(x,y)\log p(x,y)$.

Now we apply it to the continuous case by assuming that the input is a real number and the output is a real number as well. Now if the channel passes the real number verbatim, the bandwidth is infinite since a real number has infinite precision. Or if the input number has infinite range, just taking the integer part gives infinite precision as well. Therefore we have to make some assumptions about the input and add some noise:

  • Input $x\sim N(0,1)$ is drawn from standard normal distribution
  • A noise $\delta\sim N(0,\sigma)$ is sampled with variance $\sigma$
  • The output is input plus noise $x+\delta$ with variance $1+\sigma$

In this way, even though we transmitted the real number in infinite precision, it doesn't give infinite data since the least significant bits are fully random and carry no information. As the precision approaches infinity, the way we quantize the transmitted number does not matter as well, so by quantizing it to range $dx$ and $dy$, it suffices to compute the bandwidth via integration using the same formula as the discrete case:

$$ p(x,\sigma)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{x^2}{2\sigma^2}}\\ \\ \begin{aligned} C&=H(x)+H(y)-H(x,y) \\&=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}p(x,1)p(y,1+\sigma)[\log p(x,1)+\log p(y,1+\sigma)-\log(p(x,1)p(y-x,\sigma)] \mathrm{d}y\mathrm{d}x \\&=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}p(x,1)p(y,1+\sigma)\log\frac{p(y,1+\sigma)}{ p(y-x,\sigma)} \mathrm{d}y\mathrm{d}x \\&=\frac{1}{\sigma}-\frac{1}{2}\log(1+\frac{1}{\sigma}) \end{aligned} $$

Given noise variance $\sigma=1/64$, the bandwidth turns out to be nearly $64$ natural units per real number. This is surprising to me, as $\sigma=1/64$ implies roughly 6 bits of precision and means $y$ can be approximated really well with things like fp16, so I expected that the entropy per transmitted number should be proportional to the logarithm of $\sigma$.

Did I make a mistake in my calculation, or is there anything that I had missed?

1

There are 1 best solutions below

1
On BEST ANSWER

Two things seem to be incorrect:

  1. You say $\sigma$ is the variance, but then the probability function should be $p(x,\sigma)=\frac{1}{2\pi\sigma} \ e^{-x^2/(2\sigma)}$, which is not what you have. It is more usual to call $\sigma^2$ the variance, but then your variance for $y$ should have been $1+\sigma^2$, either way it's not correct.

  2. In the integral you cannot combine all three terms $H(x), H(y)$ and $H(x,y)$ with shared factor $p(y,1+\sigma)$, because $H(x,y)$ does not have that factor, it has instead $p(y-x,\sigma)$.

If - to avoid confusion - we call the variance $\nu$ and also add the missing minus sign in front, we get: $$p(x,\nu)=\frac{1}{2\pi\nu} \ e^{-x^2/(2\nu)}$$ $$H(x)=-\int_{-\infty}^{\infty} \ p(x,1) \log p(x,1) \ dx $$ $$H(y)=-\int_{-\infty}^{\infty} \ p(y,1+\nu) \log p(y,1+\nu)\ dy$$

$$H(x,y)=-\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} \ p(x,1)\ p(y-x,\nu) \log(\ p(x,1)\ p(y-x,\nu)\ ) \ dx\ dy$$ and the total channel capacity $H(x) + H(y) - H(x,y)$ becomes, when simplified: $$C= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} \ p(x,1)\ {\large\{} -p(y,1+\nu) \ \log (\ p(x,1) \ p(y,1+\nu)\ )\ + $$ $$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +\ p(y-x,\nu) \log(\ p(x,1)\ p(y-x,\nu)\ ) {\large\}} \ dx\ dy $$ which evaluates nicely to: $$ C = \frac{\small 1}{\small 2} \log(1+1/\nu) = \frac{\small 1}{\small 2} \log(1+1/\sigma^2) $$ For $\sigma=1/64$ that gives $C=4.16$ nats, which is about 6 bits.

PS: I don't really see an advantage in combining the three terms; the three seperate integrals for $H(x), H(y)$ and $H(x,y)$ are all convergent by themselves and are more clear to read!