In machine learning, we often want to learn some unknown conditional distribution from data. For example, suppose we have some dataset $D$ of feature $x_i \in R^n$ and labels $y_i \in R$ generated according to some distribution $p(y|x)$. We can represent the conditional distribution $p(y|x)$ using a large parameterized function $f_\theta$ (ie, a neural network) which maps features $x$ onto simple probability densities (ie, a mixture of gaussians). Let's call the parameters of our neural network $\theta$ and our network outputs $\psi$, so that $f_\theta(x) = \psi$. We then choose $\theta = \underset{\theta}{\arg \min} \sum_{(x,y)\in D}{}\log p_\psi(y|\psi = f_\theta(x))$
When we choose $p_\psi$ to be a mixture of gaussians, the likelihood function is easy to compute and differentiate. However, using a gaussian density in this way will lead to poor results when our dataset has outliers, pairs $(x, y)$ that are related differently than pairs with similar $x$ features. To make training more robust to these outliers, we could use a heavier tailed distribution, such as a mixture of laplace distributions. The laplace distributions would cause training to be more robust, but would decrease the overall precision in representing $p(y|x)$ with $p_\psi$.
We often use a huber loss to get the advantages of both the squared loss (gaussian) and absolute loss (laplace). However, the normalization factor for the density corresponding to the huber loss involves several evaluations of $erf$, which is undesireable.
The integrand in this question gives a density similar to the huber density. My hope is that it is possible to compute using only a few elementary functions.
For the same reasons, I would be just as happy to learn $g(b) = \int_{-\infty}^{\infty}{\exp{-b \sqrt{1 + x^2}}},$ or any function $h(b) = \int_{-\infty}^{\infty}{\exp{-b L(x)}}$ with $L(x) \overset{x\rightarrow 0}{\approx} x^2$ and $L(x) \overset{x\rightarrow \infty}{\approx} x$, as long as $L(x), h(b)$ can be computed efficiently.