All these theorems say that conditional distributions tend to be more concentrated. Are they all really one theorem?

Question

All these theorems say that conditional distributions tend to be more concentrated. Are they all really one theorem?

90 Views Asked by Bumbble Comm At 26 Mar 2026 - 12:53

Consider a probability distribution, which I'll call the "prior distribution", and some functional of that distribution.

Also consider that same functional, but applied to the probability distribution after conditioning on the value of a random variable. Since what we're conditioning on is random, this functional is itself random. I'll call the conditioned distribution the "posterior distribution."

When this functional measures concentration or dispersion of the distribution, we get theorems like these ones:

The expected value of the entropy of the posterior distribution is less than or equal to the entropy of the prior distribution
The expected value of the variance of the posterior distribution is less than or equal to the variance of the prior distribution
The expected value of the Euclidean norm of the posterior distribution is greater than or equal to the Euclidean norm of the prior distribution

(If it's not clear that Euclidean norm measures the concentration of a distribution, consider that $||p||^2 = \sum_i p_i^2$, which is the probability of drawing the same element twice.)

Is there one theorem, that has all three of these facts as special cases?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

@stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.

I'll assume our distributions are discrete, and can thus be represented as vectors. Let $\vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $\vec{p}$ be the prior distribution. In all the cases above we have:

$$E_\vec{p}[C(\vec{Q})] \ge C(\vec{p})$$

where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{\vec{p}}$ represents expectation taken with respect to the distribution $\vec{p}$.

By the law of total probability,

$$\vec{p} = E_\vec{p}[Q]$$

Substituting into the first formula,

$$E_\vec{p}[C(\vec{Q})] \ge C(E_\vec{p}[Q])$$

Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:

$$||t\vec{p}_1 + (1-t)\vec{p}_2|| \le ||t \vec{p}_1|| + ||(1-t)\vec{p}_2|| = t||\vec{p}_1|| + (1-t)||\vec{p}_2||$$

Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.

All these theorems say that conditional distributions tend to be more concentrated. Are they all really one theorem?

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in INFORMATION-THEORY

Related Questions in CONDITIONAL-PROBABILITY

Related Questions in BAYESIAN

Trending Questions

Popular # Hahtags

Popular Questions