Expected value and variance of Sigmoid and SiLU on a normally distributed random variable for variational approximation

49 Views Asked by At

I am trying to apply Assumed Density Filtering (ADF) according to the paper Lightweight Probabilistic Deep Networks to my own model, and I need to implement the variational approximation layer of Sigmoid and SiLU function.

I tried to look for the equations for Sigmoid layer. In the paper Variational Learning in Nonlinear Gaussian Belief Networks, the authors mentioned that they have a closed form solution for calculating the expected value for Sigmoid layer with the equation:

$$ M(μ,σ) = Φ(\frac{μ}{\sqrt{1+σ^2}}) $$

However, according to this question, there is only an approximation solution. Did I miss out some assumptions from the paper or misunderstood either of them?

For SiLU, I am unable to find out resources for it so far. Would appreciate if anyone could provide some guidance or point me to some resources for it.

1

There are 1 best solutions below

3
On BEST ANSWER

This is a long comment

$\def\qty#1{\left( #1 \right)}$ In the Frey & Hinton paper they use the approximation $\sigma(x)\approx \Phi(x)$. Then, $$ M(\mu,\sigma) \approx \frac{1}{\sigma} \int_{-\infty}^\infty \Phi(x) \phi\qty{\frac{x-\mu}{\sigma}}dx $$ This can be rewritten as, $$ M(\mu,\sigma) \approx \int_{-\infty}^\infty \Phi(\sigma z + \mu) \phi\qty{z}dx $$ which evaluates to, $$ M(\mu,\sigma) \approx \Phi \qty{\frac{\mu}{\sqrt{1+\sigma^2}}} $$ For the SiLU activation function, the corresponding approximation is, $$ M(\mu,\sigma) \approx \frac{1}{\sigma} \int_{-\infty}^\infty x\Phi(x) \phi\qty{\frac{x-\mu}{\sigma}}dx $$ which can be rewritten as, $$ M(\mu,\sigma) \approx \int_{-\infty}^\infty (\sigma z + \mu)\Phi(\sigma z + \mu) \phi\qty{z}dz $$ First, $$ \int_{-\infty}^\infty z \Phi(\sigma z + \mu) \phi\qty{z}dz $$ evaluates to, $$ \frac{\sigma}{\sqrt{1+\sigma^2}} \phi\qty{\frac{\mu}{\sqrt{1+\sigma^2}}} $$ Next, $$ \int_{-\infty}^\infty \Phi(\sigma z + \mu) \phi\qty{z}dz = \Phi \qty{\frac{\mu}{\sqrt{1+\sigma^2}}} $$ So for the SiLU activation function, $$ M(\mu,\sigma) \approx \frac{\sigma^2}{\sqrt{1+\sigma^2}} \phi\qty{\frac{\mu}{\sqrt{1+\sigma^2}}} + \mu \Phi \qty{\frac{\mu}{\sqrt{1+\sigma^2}}} $$