Validity of a proof of the Fisher Information Data Processing inequality, $I(f(X)) \le I(X)$.

152 Views Asked by At

I'm trying to prove that taking a function of a random variable never creates a better estimator (in the terms of Fisher information) than using the original random variable directly.

I have a proof (below), but I'm worried that I'm abusing the Lebesgue integral notation too much. I'm worried about operations like introducing an extra $\frac{dP}{dP}$ and breaking these up as if they were fractions. I'm also unsure if I should write $\frac{d \log P(x)}{d\rho}$ or $\frac{\log dP(x)}{d\rho}$ or even $\frac{d}{d\rho}\log dP(x)$...

I would really appreciate if anybody could check the proof (which is quite trivial, only using a single application of Cauchy Schwarz) and let me know how to make it formal. (Or if it already is?)

Fisher Information Data Processing Inequality:

Let $X$ be a random variable with distribution $dP$, and let $f$ be some channel with distribution $f(x) \sim dQ(q|x)$. Then $$ I(f(X)) \le I(X), $$ where $I[X] = E_X\left[\left(\frac{\log dP}{d\rho}\right)^2\right]$ is the Fischer information of $X$.

Proof:

Let $dQ$ the marginalized distribution of $f(X)$. Then \begin{align} I[f(X)] &= E_Q\left[\left(\frac{\log dQ}{d\rho}\right)^2\right] \\&= E_Q\left[\left(\frac{1}{dQ}\frac{dQ}{d\rho}\right)^2\right] \\&= \int_q\frac{1}{dQ(q)}\left(\frac{d Q(q)}{d\rho}\right)^2 \\&= \int_q\frac{1}{dQ(q)}\left(\frac{d}{d\rho}\int_x dQ(q|x) dP(x)\right)^2 \\&= \int_q\frac{1}{dQ(q)}\left(\int_x dQ(q|x) \frac{dP(x)}{d\rho}\right)^2 \\&= \int_q\frac{1}{dQ(q)}\left(\int_x dQ(q|x) \frac{dP(x)}{d\rho} \frac{dP(x)}{dP(x)}\right)^2 \\&= \int_q\frac{1}{dQ(q)}\left(\int_x dQ(q|x) \frac{\log dP(x)}{d\rho} dP(x)\right)^2 \\&\le \int_q\frac{1}{dQ(q)}\int_x dQ(q|x) dP(x) \int_x dQ(q|x) \left(\frac{\log dP(x)}{d\rho}\right)^2 dP(x) \\&= \int_q\frac{dQ(q)}{dQ(q)} \int_x dQ(q|x) \left(\frac{\log dP(x)}{d\rho}\right)^2 dP(x) \\&= \int_q \int_x dQ(q|x) \left(\frac{\log dP(x)}{d\rho}\right)^2 dP(x) \\&= \int_x \left(\int_q dQ(q|x)\right) \left(\frac{\log dP(x)}{d\rho}\right)^2 dP(x) \\&= \int_x \left(\frac{\log dP(x)}{d\rho}\right)^2 dP(x) \\&= E_X\left[\left(\frac{\log dP}{d\rho}\right)^2\right] \\&= I(X) \end{align}

1

There are 1 best solutions below

4
On BEST ANSWER

As pointed out by gandalf, there were indeed some issues with confusing densities, differentials and derivatives.

I believe the version below fixes the issues.

Let $Q$ the distribution of $Y=f(X)$ and $q=\frac{dQ}{dy}$ the density. Then $$ \begin{align} I(Y) &= E_Y\left[\left(\frac{d}{d\rho}\log q(Y)\right)^2\right] \\&= E_Y\left[\left(\frac{1}{q(Y)}\frac{dq(Y)}{d\rho}\right)^2\right] \\&= \int_y\left(\frac{1}{q(y)}\frac{dq(y)}{d\rho}\right)^2 dQ(y) \\&= \int_y\frac{dQ(y)}{q(y)^2}\left(\frac{dq(y)}{d\rho}\right)^2 \\&= \int_y\frac{dy}{q(y)}\left(\frac{d}{d\rho}\int_x q(y|x)p(x)dx\right)^2 \\&= \int_y\frac{dy}{q(y)}\left(\int_x q(y|x) \frac{dp(x)}{d\rho} dx\right)^2 \\&= \int_y\frac{dy}{q(y)}\left(\int_x q(y|x) \frac{dp(x)}{d\rho} \frac{p(x)}{p(x)}dx\right)^2 \\&= \int_y\frac{dy}{q(y)}\left(\int_x q(y|x) \frac{d \log p(x)}{d\rho} dP(x)\right)^2 \\&\le \int_y\frac{dy}{q(y)}\left(\int_x q(y|x) dP(x)\right)\left( \int_x q(y|x) \left(\frac{d \log p(x)}{d\rho}\right)^2 dP(x)\right) \\&= \int_y\frac{dy}{q(y)} q(y) \int_x q(y|x) \left(\frac{d \log p(x)}{d\rho}\right)^2 dP(x) \\&= \int_y dy \int_x q(y|x) \left(\frac{d \log p(x)}{d\rho}\right)^2 dP(x) \\&= \int_x \left(\int_y q(y|x)dy\right) \left(\frac{d \log p(x)}{d\rho}\right)^2 dP(x) \\&= \int_x \left(\frac{d \log p(x)}{d\rho}\right)^2 dP(x) \\&= E_X\left[\left(\frac{d \log p(X)}{d\rho}\right)^2\right] \\&= I(X) \end{align} $$

Where we used that $q(y) = \int_x q(y|x) dP(x)$ and $\int_y q(y|x)dy = 1$.