Let $A$ be a normal distribution with variance $\sigma_A^2$ and $B$ be a continuous random variable with variance $\sigma_B^2$. Here, $A$ and $B$ are independent. Is the Gaussian distribution for $B$ minimizes the differential entropy $h(A|A+B)$?
2026-03-28 16:26:28.1774715188
Minimizing Differential Entropy of a Gaussian Random Variable Conditioned on Sum of Gaussian and Non-Gaussian Random Variables
156 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in PROBABILITY
- How to prove $\lim_{n \rightarrow\infty} e^{-n}\sum_{k=0}^{n}\frac{n^k}{k!} = \frac{1}{2}$?
- Is this a commonly known paradox?
- What's $P(A_1\cap A_2\cap A_3\cap A_4) $?
- Prove or disprove the following inequality
- Another application of the Central Limit Theorem
- Given is $2$ dimensional random variable $(X,Y)$ with table. Determine the correlation between $X$ and $Y$
- A random point $(a,b)$ is uniformly distributed in a unit square $K=[(u,v):0<u<1,0<v<1]$
- proving Kochen-Stone lemma...
- Solution Check. (Probability)
- Interpreting stationary distribution $P_{\infty}(X,V)$ of a random process
Related Questions in NORMAL-DISTRIBUTION
- Expectation involving bivariate standard normal distribution
- How to get a joint distribution from two conditional distributions?
- Identity related to Brownian motion
- What's the distribution of a noncentral chi squared variable plus a constant?
- Show joint cdf is continuous
- Gamma distribution to normal approximation
- How to derive $E(XX^T)$?
- $\{ X_{i} \}_{i=1}^{n} \thicksim iid N(\theta, 1)$. What is distribution of $X_{2} - X_{1}$?
- Lindeberg condition fails, but a CLT still applies
- Estimating a normal distribution
Related Questions in INFORMATION-THEORY
- KL divergence between two multivariate Bernoulli distribution
- convexity of mutual information-like function
- Maximizing a mutual information w.r.t. (i.i.d.) variation of the channel.
- Probability of a block error of the (N, K) Hamming code used for a binary symmetric channel.
- Kac Lemma for Ergodic Stationary Process
- Encryption with $|K| = |P| = |C| = 1$ is perfectly secure?
- How to maximise the difference between entropy and expected length of an Huffman code?
- Number of codes with max codeword length over an alphabet
- Aggregating information and bayesian information
- Compactness of the Gaussian random variable distribution as a statistical manifold?
Related Questions in ENTROPY
- Relation between Shanon entropy via relation of probabilities
- How to maximise the difference between entropy and expected length of an Huffman code?
- Appoximation of Multiplicity
- Two questions about limits (in an exercise about the axiomatic definition of entropy)
- Computing entropy from joint probability table
- Joint differential entropy of sum of random variables: $h(X,X+Y)=h(X,Y)$?
- What is the least prime which has 32 1-bits?
- Eggs, buildings and entropy
- Markov chains, entropy and mutual information
- Entropy and Maximum Mutual Information
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
First notice that $I(A;A+B) = h(A) - h(A|A+B)$. Here $h(A)$ is fixed (since the law of $A$ is fixed), and so minimising the conditional entropy in the question is equivalent to maximising this mutual information. In other words, you're asking a sort of dual question to the channel coding problem: "given that I'm going to feed an additive channel a Gaussian input, what noise distribution is the most benign." The answer here is not a Gaussian $B$ - in a rough sense, the Gaussian is something like a worst-case additive noise law, due to its entropy maximisation. This means that more concentrated noise laws should yield better performance.
Concretely, first think of a noise distribution that is discrete, say distributed on $\pm \beta$ such that (for simplicity) $\beta$ has finite binary expansion. In this case, the noise can only corrupt the first few bits of (the binary expansion of) a real number input $a$, and so we can transmit an arbitrary amount of information in the tail of its binary expansion. Now, this qualitative fact has to basically remain true even if we smear out the discreteness over a tiny set in order to satisfy your continuity requirement. Thus, under such noise, we should attain very high mutual information.
Below I'll formalise this intuition.
For simplicity, just consider the case $\sigma_A^2 = \sigma_B^2 = 1.$ If $B$ is a Gaussian, then it's a simple matter of computation that the mutual information is $\frac12 \log(2)$.
Now, for $\beta, \delta \in [0,1]$ consider $p_B$ of the form $$ p_B(b;\beta,\delta) := \frac1{4\delta} ( \mathbf{1}\{ |b-\beta|\le \delta\} + \mathbf{1}\{|b+\beta| \le \delta\}).$$ This puts mass uniformly in a window of width $2\delta$ about both $+\beta$ and $-\beta$. Equivalently, you can think of $B = Z+N$, where $Z$ is uniform on $\pm \beta,$ and $N$ is uniform on $[-\delta, \delta]$. This pair satisfies the variance condition if $\beta^2 + \delta^2/3 = 1.$
Now, $$I(A;A+B) = h(A+B) - h(A+B|A) = h(A+B) - h(B) \ge h(A) - h(B),$$ where we have used the independence of $A$ and $B$, and the final inequality uses the fact that $0 \le I(B;A+B) = h(A+B) - h(A),$ again using independence.
This means that under the above noise distribution, we have $$ I(A; A+B) \ge \frac{1}{2} \log (2\pi e) - h(B).$$ It suffices to argue that we can choose $\beta, \delta$ such that $$ \frac12 \log (2\pi e) - h(B) \ge \frac12 \log 2 \iff h(B) \le \frac12 \log(\pi e).$$
But the differential entropy of $B$ is driven entirely by $\delta.$ Indeed, we have $$ h(B) = -\frac1{4\delta} \int_{-\beta - \delta}^{-\beta + \delta} \log \frac1{4\delta} - \frac1{4\delta} \int_{\beta - \delta}^{\beta + \delta} \log \frac1{4\delta} = \log 4\delta.$$ So, as long as $\delta$ is small, say $\le 1/4,$ the mutual information $I(A;A+B)$ exceeds $\frac12 \log 2,$ that achievable via a Gaussian $B$.
Note that the above style of example (bilevel-discrete with a skinny continuous noise) continues to work no matter what distribution you pick for $A$ - since $h(B) \to -\infty$ as $\delta \to 0,$ no matter what $h(A)$ is, we can pump the mutual information $I(A;A+B)$ arbitrarily high by using a noise distribution like the above. However, if the noise distribution was Gaussian, then the capacity achieving input distribution is Gaussian, so the maximum mutual information with Gaussian $B$ remains bounded. I think the more natural conjecture might be that a Gaussian $B$ minimises $I(A;A+B)$, but I don't know how difficult this is to show.