Numerically stable method for sampling from truncated normal distribution

240 Views Asked by At

I am implementing a truncated normal (gaussian) distribution N(x | μ, σ, a, b) in Python that is both differentiable with respect to its parameters and numerically stable when μ is outside the interval [a, b].

I have been able to implement numerically stable methods for calculating the mean, variance and log-probability. Next, I need to implement a method for sampling from the distribution. This Stack Exchange post proposes the following method: $$ x = \Phi^{-1}(\Phi(\alpha) + U \cdot (\Phi(\beta)-\Phi(\alpha))\sigma + \mu $$

Thanks to the reparameterization trick, this expression is differentiable with respect to the distribution parameters. Nonetheless, it results numerically unstable when the expression inside $\Phi^{-1}$ gets very close to either -1 or 1.

Therefore, my question is the following: Is there a method for sampling from a truncated normal distribution that is both stable and differentiable?

1

There are 1 best solutions below

12
On

Let me propose a quick and dirty method:

The probability to see $x\in[\alpha,\beta]$ is $P(x)={1\over Z}e^{{-x^2}\over2}$ where $Z$ is a normalization factor: $$Z=\int_\alpha^\beta e^{{-x^2}\over2}dx$$. In you case $\alpha,\beta\gg 1$, so we nay write $x=\alpha+y$ and for $y\in[0,\beta-\alpha]$ $$P(y)={1\over Z}e^{{-(\alpha+y)^2}\over2}={1\over Z}e^{{-\alpha^2-2\alpha y-y^2}\over2}$$. A long as $\alpha\gg\beta-\alpha$, we can ignore the $y^2$ term in the exponent, since the $2\alpha y$ term is always much bigger. Thus we have $$P(y)={1\over W}e^{-\alpha y}$$ where I wrote $W$ to take into account the various normalizations, but we can compute it explictly:

Since $1=\int_0^{\beta-\alpha} P(y)dy$ we must have $W=\int_0^{\beta-\alpha} e^{-\alpha y}dy={1-e^{-(\beta-\alpha)\alpha}\over \alpha}\approx {1\over \alpha}$, again since $(\beta-\alpha)\alpha\gg 1$ so that the term is exponentially small.

We are left with generating random exponential numbers. Generating exponential random numbers via reparametrization is a classical technique.

To conclude:

  1. when $\alpha,\beta\gg1$ the Gaussian looks like an exponential in the vicinity of $\alpha$.
  2. The exponential decays so fast, that the chance of ever selecting a number near $\beta$ is negligible. If you still need it, you can probably get away with correcting the normalization factor, with no need to add the $y^2$ term in the exponential.
  3. exponential random variates can be generated via reparametrization.
  4. The crossover regime, where one can either use the full blown $\Phi^{-1}$ reparametrization or the exponential approximation may need some smoothing to interpolate.