Calculate mutual information for joint distributions without PDF and PMF

194 Views Asked by At

Consider a random variable $X\sim U[l,h]$ and another random variable $S=DX+(1-D)Y$, where $Y\sim U[l,h]$ is independent of $X$, $Pr[D=1]=q$, and $Pr[D=0]=1-q$. We can show that the cummulative distribution function $F_{X,S}(x,s)=\min\{\frac{s-l}{h-l},\frac{x-l}{h-l}\}\cdot q+\frac{s-l}{h-l}\cdot\frac{x-l}{h-l}\cdot (1-q)$.

I am interested in the mutual information of $X$ and $S$, then by definition we need to compute $\int_l^h\int_l^h f_{X,S}(x,s)\ln[\frac{f_{X,S}(x,s)}{f_X(x)f_S(s)}]dxds$, where $f_{X,S}$ denotes the joint probability density function, and $f_X(x),f_S(s)$ denotes the marginal densities.

An issue arises where there is no probability density function or probability mass function in the region $\{(x,s)|x=s\in(l,h)\}$, and this region has a positive mass $q$. It seems that this region $\{(x,s)|x=s\in(l,h)\}$ should carry infinite weight (relative to the other region) in the mutual information formula, but I'm not sure how to calculate this.

Also, the mutual information here has a desirable interpretation - $S$ can be viewed as a noisy signal of $X$. $S$ reveals the value of $X$ with probability $q$ and reveals pure noise with complementary probability. The mutual information is the cost of obtaining this noisy signal $S$.

How do we calculate the mutual information/cost of $S$ in this weird situation?

1

There are 1 best solutions below

1
On

Let's attack the discrete case ($X$ and $Y$ are discrete uniform over the same support of size $n$, assume WLOG it includes $x=0$).

We want $$I(X;S)=H(S)-H(S|X) \tag1$$

But, since $S$ has the same distribution as $X$ and $Y$, $H(S)=\log n$.

And, by the symmetry of the problem $H(S|X)=H(S|X=0)$

Now, the pmf of $S|X=0$ corresponds to a mixture of a single value at $x=0$ (with weight $p_0 = q + (1-q)/n$ ) and a uniform over different $n-1$ values (with weight $1-p_0$).

Hence $$H(S|X=0)=h_b(p_0) + (1-p_0) \log(n-1) \tag2$$

$h_b(q)$ where $h_b$ is the binary entropy function.

For large $n$, this gives $$I(X;S) = q \log(n) +q \log\left(\frac{1-q}{q}\right)-\log(1-q) +o(1) \approx q \log(n) $$

This hintas that the mutual information in the continuous case is infinite.

This fact can also be inferred by a channel encoding interpretation. If we take $X$ as channel input, and $S$ and output, it's easy to devise a channel encoding that trasmits an infinite amount of information at a finite rate. For example (with feedback for simplicity), to send a real number $X \in [\ell,h]$, just send it through the channel twice; if the receiver reads different values, it means that at least one of them was $Y$, hence it signals the transmitter to retry; repeat until both values coincide (which, with probability $1$, implies that the receiver has read $X$).