Bounded variance for Lipschitz function of random variable

940 Views Asked by At

In Priors for Infinite Networks (Neal, 1996), part of the proof is that $\tanh(X)$ for Gaussian RV $X$ has finite variance, which is later used for the Central Limit Theorem.

For arbitrary activation function $\sigma$, is it enough for $\sigma$ to be Lipschitz to say $\sigma(X)$ has finite variance? Intuitively I think so, since for simplicity if $\sigma(0) = 0$ then there is the bound $\sigma(x) \le L x$, so $\sigma(x)^2 \le (L x)^2$ for all $x$, thus $\operatorname{E}[\sigma(X)^2] \le L^2 \operatorname{E}[X^2]$. Is this correct?

2

There are 2 best solutions below

2
On

One general conclusion is, if the moment $\mathrm{E}[X^2]$ exists (finite), $\mathrm{Var}[f(X)]$ is bounded for any $L$-Lipschitz function $f$. For $$ \mathrm{Var}[f(X)] \le 2L^2 \mathrm{E}[X^2], $$ which is a very accurate inequality (means hard to improve it anymore).

Usually, to prove this we need symmetrization. Let $X'$ be a i.i.d. copy of $X$, we have $$ \begin{aligned} \mathrm{Var}[f(X)] &=\mathrm{Var}[f(X)-\mathrm{E}_{X'}f(X')] \\ &\le \mathrm{E}_{X}[f(X)-\mathrm{E}_{X'}f(X')]^2 \\ &= \mathrm{E}_{X}[\mathrm{E}_{X'}[f(X)-f(X')]]^2 \quad (\text{i.i.d. copy})\\ &\le \mathrm{E}_{X}\mathrm{E}_{X'} [f(X)-f(X')]^2 \quad (\text{Jensen's inequality})\\ &\le L^2\mathrm{E}[X-X']^2 \quad (L\text{-Lipschitz})\\ &= L^2\mathrm{E}[X-\mathrm{E}X-(X'-\mathrm{E}X')]^2 \quad (\text{i.i.d. copy})\\ &= L^2[\mathrm{Var}(X)+\mathrm{Var}(X')] \quad (\text{intersecting item becomes 0 due to i.i.d. copy})\\ &= 2L^2\mathrm{Var}(X) \quad (\text{i.i.d. copy})\\ &\le 2L^2\mathrm{E}[X^2]. \end{aligned} $$ May this give you more insight.

0
On

You can use similar reasoning to that of @Nanayajitzuki to improve their suggested bound by a factor of 2. Specifically, take $X, Y$ i.i.d. Then, $X$ and $Y$ are uncorrelated being independent, i.e., $cov(X,Y) = 0$ and therefore also $var(X-Y) = var(X) + var(-Y) = 2var(X)$. Moreover, $f(X)$ and $f(Y)$ are also independent (as functions of independent RVs) and therefore also uncorrelated, i.e., $cov(f(X), f(Y)) = 0$ and $2var(f(X)) = var(f(X) - f(Y))$.

Now, $$ \begin{aligned} 2var(f(X)) &= var(f(X) - f(Y)) \\ &= E[(f(X) - f(Y))^2] - E^2[f(X) - f(Y)] \\ &= E[(f(X) - f(Y))^2] \qquad\qquad f(X), f(Y) \text{ are identically distributed} \\ &\leq L^2 E[(X-Y)^2] \qquad\qquad f \text{ is Lipschitz} \\ &= L^2 2 var(X) \qquad\qquad \text{as in chain of eqs. before the ineq.} \end{aligned} $$

Therefore, $var(f(X)) \leq L^2 var(X)$ for an $L$-Lipschitz function $f$.

As for bounding the (non-central) second moment. Note that $f(x)$ is $L$-Lipschitz iff $f(X) + \text{const}$ as well for any constant. Therefore, $E[f^2(X)]$ can be arbitrarily large for the same $E[X^2]$.