In Priors for Infinite Networks (Neal, 1996), part of the proof is that $\tanh(X)$ for Gaussian RV $X$ has finite variance, which is later used for the Central Limit Theorem.
For arbitrary activation function $\sigma$, is it enough for $\sigma$ to be Lipschitz to say $\sigma(X)$ has finite variance? Intuitively I think so, since for simplicity if $\sigma(0) = 0$ then there is the bound $\sigma(x) \le L x$, so $\sigma(x)^2 \le (L x)^2$ for all $x$, thus $\operatorname{E}[\sigma(X)^2] \le L^2 \operatorname{E}[X^2]$. Is this correct?
One general conclusion is, if the moment $\mathrm{E}[X^2]$ exists (finite), $\mathrm{Var}[f(X)]$ is bounded for any $L$-Lipschitz function $f$. For $$ \mathrm{Var}[f(X)] \le 2L^2 \mathrm{E}[X^2], $$ which is a very accurate inequality (means hard to improve it anymore).
Usually, to prove this we need symmetrization. Let $X'$ be a i.i.d. copy of $X$, we have $$ \begin{aligned} \mathrm{Var}[f(X)] &=\mathrm{Var}[f(X)-\mathrm{E}_{X'}f(X')] \\ &\le \mathrm{E}_{X}[f(X)-\mathrm{E}_{X'}f(X')]^2 \\ &= \mathrm{E}_{X}[\mathrm{E}_{X'}[f(X)-f(X')]]^2 \quad (\text{i.i.d. copy})\\ &\le \mathrm{E}_{X}\mathrm{E}_{X'} [f(X)-f(X')]^2 \quad (\text{Jensen's inequality})\\ &\le L^2\mathrm{E}[X-X']^2 \quad (L\text{-Lipschitz})\\ &= L^2\mathrm{E}[X-\mathrm{E}X-(X'-\mathrm{E}X')]^2 \quad (\text{i.i.d. copy})\\ &= L^2[\mathrm{Var}(X)+\mathrm{Var}(X')] \quad (\text{intersecting item becomes 0 due to i.i.d. copy})\\ &= 2L^2\mathrm{Var}(X) \quad (\text{i.i.d. copy})\\ &\le 2L^2\mathrm{E}[X^2]. \end{aligned} $$ May this give you more insight.