Suppose I have a random variable $X$ which is normally distributed $\mathcal{N}(0,1)$. The distribution has entropy $\frac{1}{2}(1 + log(2\pi))$, about 1.42 nats or 2 bits of information. So that means when I observe it, I "learn" about 2 bits of information on average (right?).
Suppose then that we have a variable of noise, $Y$ distributed as $\mathcal{N}(0, \sigma^2)$. I can't observe $X$ directly, I can only observe $X+Y$. But I am interested in $X$ and don't care about $Y$. How much information can I learn about $X$ after observing $X+Y$, in entropy terms? I'm struggling to even pose the problem formally–how do I quantify how much is learned about $X$ while ignoring $Y$?
Looking for the answer to this issue and to refine my intuition for thinking about information theory problems.
The concept of mutual information seems to capture exactly what you are looking for:
Specifically, you would be looking at $I(X; X+Y)$.
Here I am reading your question
as "how much information does observing $X+Y$ gives me about $X$". If you meant "how much information remains to be learnt in $X$" (i.e., how much remaining entropy there is), then you may want to look at the conditional entropy instead (both are very much related).