Channel Capacity 0 or $\infty$

245 Views Asked by At

I have stumbled across the following, weird situation:

Consider a noiseless channel $Y = X$ where $X$ and $Y$ denote in- and output, respectively. The only restriction placed on $X$ is to be in the interval $[0, 1]$. One is asked to determine the capacity of this channel in bits / channel use.

An obvious idea is the following: Any message $M$, denoting a natural number, may be sent over this channel by having its binary representation transferred as follows: $X(m) = 0.m_{binary}$. Since $Y=X$, we can reach $C = \infty$.

On the other hand, one may apply $C = \sup_X I(X;Y)$. In this case, this simplifies to $C = \sup_X h(X)$ where $h(X)$ is the differential entropy. $h(X)$ is - under the constraint that $X \in [0, 1]$ - maximized by the uniform distribution on this interval, the differential entropy of which is $log(interval \; length) = log(1) = 0$.

Clearly the two "results" do not agree and the second one seems to be complete garbage. Where have I gone wrong with my reasoning?

Thank you in advance,

R.G.

2

There are 2 best solutions below

1
On BEST ANSWER

The error is in stating $I(X;Y) = h(X)$, in fact $I(X;Y)$ isn't defined when $X$ is a continuous random variable and $Y=X$ because the random (vector) variable $(X,Y)$ isn't continuous (doesn't have a joint pdf).

The joint cdf $F(x,y) = P(X \leq x, Y \leq y) = P(X \leq \min(x,y))$ has a "ridge" along $x=y$ which prevents it from being differentiable. Formally the "density" would be $f_X(x) \delta(x-y)$ where $\delta$ is the dirac distribution (generalized function), but it makes no sense to plug this in into the integral formula for mutual information of continuous variables since the log of a distribution (generalized function) and product of distributions (generalized function) with non-disjoint support isn't defined.

What you can do is to consider $X$ with a discrete distribution with $2^n$ equiprobable values in $[0,1]$, then $I(X;Y) = H(X) = n$. Then you get that the supremum over those distributions is $\infty$ and thus $C = \infty$.

You can relate differential entropy to classical entropy by considering a quantization of the continuous random variable, see my answer here Differential Entropy

This interpretation says that the entropy of the quantized variable is approximately (in fine quantization limit) equal to the differential entropy plus the entropy of a uniform variable with $2^n$ values if quantization is done uniformly with intervals of length $\Delta = 2^{-n}$. In case of $(X,Y)$ being continuous this means that the continuous mutual information $I(X;Y) = h(X)-h(X|Y)$ both terms have the additional $\log \Delta$ terms when comparing to the discrete mutual information of the quantized versions of the variables. This motivates the usage of continuous mutual information as a fine quantization limit or quantization independent measure.

Edit:

That $I(X;Y)$ isn't defined when $(X,Y)$ doesn't have a pdf might be a bit of a cop-out. You may also consider the channel $Y=X$ as the limit of channels $Y_\sigma = X + Z_\sigma$ where $Z_\sigma \sim N(0,\sigma)$ and $X$ and $Z_\sigma$ are independent. The limit considered is when $\sigma \to 0$.

Then $I(X;Y_\sigma) = h(Y_\sigma) - h(Y_\sigma|X)$. The pdf of $Y_\sigma$ is the convolution of the pdf of $X$ and $Z_\sigma$, so in case $X$ is uniform on $[0,1]$ it will be a mollified (smoothed) version of the uniform distribution on $[0,1]$ and will tend to the uniform distribution as $\sigma \to 0$. So $h(Y_\sigma) \to 0$ as $\sigma \to 0$. The distribution of $Y_\sigma$ conditioned on $X=x$ will be $N(x,\sigma)$ and thus have differential entropy $\log(\sigma \sqrt{2\pi e})$ which is independent of $x$ so integrating it multiplied with the pdf of $X$ gives $h(Y_\sigma | X) = \log(\sigma \sqrt{2\pi e}) \to -\infty$ and thus $C_\sigma = \sup_X h(Y_\sigma)-h(Y_\sigma|X) \to \infty$ as $\sigma \to 0$.

This indicates that $h(X|X)$ might be interpreted as $-\infty$ which certainly isn't 0. This is also consistent with $h(Z_\sigma) \to -\infty$ as $\sigma \to 0$ which is considering a constant random variable approximated by gaussians. The same happens if you consider uniform distribution $U_n$ on $[0,2^{-n}]$ where $h(U_n) = -n \to -\infty$, etc.

However, beware of overinterpreting this. You may also consider an approximation $D_{m,n}$ of a discrete variable with $2^n$ equiprobable values in $[0,1]$ by placing intervals of width $2^{-m}$ (where $m\geq n$) covering each of the $2^n$ values. Then $h(D_{m,n}) = n-m \to -\infty$ as $m\to \infty$. So making the continuous variable approach a discrete distribution makes its differential entropy tend to $-\infty$.

So, just because differential entropy "is" $-\infty$ doesn't mean the random variable is constant. This also illustrates another aspect of the difference between classical entropy and differential entropy.

6
On

The problem is that the differential entropy $h(X)$ is not a true entropy $H(X)$. For example, the diferential entropy of an uniform continuous variable in some real interval is finite (positive, or zero or even negative!), but the amount of information it provides (its "true entropy") is infinite.

In particular, it's true that $I(X;Y) = h(X) - h(X|Y)$ (that is, the left side is a true mutual information, however the terms in the right side are not true entropies).

But it's false that $I(X;X)= h(X)-h(X|X) =h(X) $ because it's false that $h(X|X)=0$ (the concept: "zero entropy = zero uncertainty" does not apply to differential entropies).

See, for example, here.

On the other side, it's true that $C=\max I(X;X) = H(X) = +\infty$