Showing the data processing inequality

Question

Showing the data processing inequality

189 Views Asked by Bumbble Comm At 26 Mar 2026 - 10:12

Let $X \sim P_\theta$ for some distribution $P_\theta$ parametrized by $\theta \in \Theta \subset \mathbb R$ and $Y \sim Q(\cdot | X)$ for some distribution $Q$. Assume that $P_\theta$ has a density $p_\theta$, with respect to some ground measure $\mu$ and $Q(\cdot|X)$ has a density $q(\cdot | X)$ with respect to the same $\mu$ for every $X$. Show that the Fisher information of $X$ is at least as large as that of $Y$:

$$\mathbb E\left [ \frac{\int q(Y | X=x) \dot p_\theta(x)dx}{\int q(Y | X=x) p_\theta(x)dx} \right ]^2 \leq \mathbb E\left[\frac{\dot p_\theta(X)}{p_\theta(X)}\right]^2$$

where $\dot p_\theta(x)$ is the first derivative of the density $p_\theta$ with respect to $\theta$.

I'm looking for hints for solving this problem (preferably vague hints that will allow me to still solve the problem myself).

What I've tried

The inequality makes some sense to me intuitively but I'm having trouble understanding why it's true. I would usually first try to show a pointwise inequality, but that won't work here since the expectations are over different variables. I also tried expressing the expectations as integrals and switching the order of integration, but that didn't give me any new insights.

My instinct is to apply Cauchy-Schwarz or Jensen's inequality, but I haven't figured out the right way to apply them here.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This is my solution. Hope it works.

Theorem. Let $X$ a r.v. with probabiliy depending on a parameter $\theta$ and $Y$ s.t. $p(y|x)$ does not depend on $\theta$. Than:

$$I(X)\ge I(Y)$$

where $I$ is the Fisher information.

I remove in the following the $\theta$ indexes in order to simplify notation. For the moment I leave out some details I may add later if I do not find errors or if some reader asks.

LEMMA 1:

$$ I(X)=Var\left(\frac{\partial}{\partial \theta} log \ p(X)\right)$$

This is true by definition of Fisher information. q.e.d.

LEMMA 2:

$$Var(E(X|Y))\le Var(X)$$

This is true by the conditional variance decomposition. q.e.d.

LEMMA 3:

$$E\left[\frac{\partial}{\partial \theta} log \ p(X)|Y\right]= \frac{\partial}{\partial \theta} \ log \ p(Y)$$

Let's evaluate $E\left[\frac{\partial}{\partial \theta} log \ p(X)|Y=y\right]$ for a certain $y$. This is equal to:

$\int_X dx \frac{\partial}{\partial \theta} log \ p(x) p(x|y)=$

Now we use Bayes, expand log derivatives and simplify:

$=\int_X dx \frac{\dot p(x)}{p(x)} p(y|x) \frac{p(x)}{p(y)}=\frac{1}{p(y)} \int_X dx \dot p(x)p(y|x)=\frac{\partial}{\partial \theta} log p(y)$

q.e.d.

Now combining LEMMA 1, 2 and 3 we get our theorem:

$I(Y)=Var(\frac{\partial}{\partial \theta} \ log \ p(Y)) = Var (E\left[\frac{\partial}{\partial \theta} log \ p(X)|Y\right])\le Var(\frac{\partial}{\partial \theta} log \ p(X))=I(X)$

The big part is lemma 3.

Showing the data processing inequality

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in CAUCHY-SCHWARZ-INEQUALITY

Related Questions in JENSEN-INEQUALITY

Trending Questions

Popular # Hahtags

Popular Questions