Entropy of a variable after adding noise

204 Views Asked by At

Suppose I have a random variable $X$ which is normally distributed $\mathcal{N}(0,1)$. The distribution has entropy $\frac{1}{2}(1 + log(2\pi))$, about 1.42 nats or 2 bits of information. So that means when I observe it, I "learn" about 2 bits of information on average (right?).

Suppose then that we have a variable of noise, $Y$ distributed as $\mathcal{N}(0, \sigma^2)$. I can't observe $X$ directly, I can only observe $X+Y$. But I am interested in $X$ and don't care about $Y$. How much information can I learn about $X$ after observing $X+Y$, in entropy terms? I'm struggling to even pose the problem formally–how do I quantify how much is learned about $X$ while ignoring $Y$?

Looking for the answer to this issue and to refine my intuition for thinking about information theory problems.

2

There are 2 best solutions below

0
On BEST ANSWER

The concept of mutual information seems to capture exactly what you are looking for:

[Mutual information] quantifies the "amount of information" (in units such as shannons (bits), nats or hartleys) obtained about one random variable by observing the other random variable.

Specifically, you would be looking at $I(X; X+Y)$.


Here I am reading your question

How much information can I learn about $X$ after observing $X+Y$, in entropy terms

as "how much information does observing $X+Y$ gives me about $X$". If you meant "how much information remains to be learnt in $X$" (i.e., how much remaining entropy there is), then you may want to look at the conditional entropy instead (both are very much related).

0
On

The distribution has ... about 1.42 nats or 2 bits of information. So that means when I observe it, I "learn" about 2 bits of information on average (right?).

Not, that's wrong. You cannot encode a gaussian variable using 2 bits. Also, imagine you define a new variable by just multiplying the original value by two, $Y= 2 X$. Then the entropy of $Y$ is bigger than that of $X$. But, both carry the same information. Even more, if you define $Z=X/1000$ , the entropy of $Z$ would be negative!

Explanation: What you've computed is the differential entropy, which is a different thing from the "true" (Shannon) entropy. They bear some similarities, but the differential entropy cannot be interpreted as the amount of bits of information. Strictly speaking the (Shannon) entropy of a gaussian is infinite.

The mutual information, however, has the same interpretation both for discrete and continuous variables.

In your case, calling $Z=X+Y$, and assuming $X,Y$ are independent

$$I(X,Z) = h(Z)-h(Z|X) = h(X+Y) - h(Y)$$

Here, $h()$ denotes the differential entropy.