In the paper "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor" Appendix C, it mentioned that applying $\tanh$ to the Gaussian sample gives us the probability of a bounded result in the range of $(-1,1)$:
we apply an invertible squashing function ($\tanh$) to the Gaussian samples, and employ the change of variables formula to compute the likelihoods of the bounded actions. In the other words, let $u ∈ R^D$ be a random variable and $\mu(u|s)$ the corresponding density with infinite support. Then $a = \tanh(u)$, where $\tanh$ is applied elementwise, is a random variable with support in $(−1, 1)$ with a density given by $$ \pi(a|s)=\mu(u|s)\left|\det \left({da\over du}\right)\right|^{-1} $$
How does this work out?
Thanks for any help.
The density $f_X$ of a random variable $X$ satisfies
\begin{equation} P(x_1 \lt X \lt x_2) = \int_{x_1}^{x_2} f_X(x) \ dx. \end{equation}
If we take $Y = g(X)$, we seek a density $f_Y$ satisfying
\begin{equation} P(y_1 \lt Y \lt y_2) = \int_{y_1}^{y_2} f_Y(y) \ dy. \end{equation}
Denoting $h = g^{-1}$, integrating from $y_1$ to $y_2$ in the new distribution is equivalent to integrating from $x_1 = h(y_1)$ to $x_2 = h(y_2)$ in the original distribution:
\begin{equation} \int_{y_1}^{y_2} f_Y(y) \ dy = \int_{x_1 = h(y_1)}^{x_2 = h(y_2)} f_X(x) \ dx \end{equation}
Applying the univariate change of variables formula [1] to the right-hand side, we obtain
\begin{equation} \int_{x_1=h(y_1)}^{x_2=h(y_2)} f_X(x) \ dx = \int_{y_1}^{y_2} f_X(h(y))\ h'(y)\ dy. \end{equation}
Tentatively, we might conclude that
\begin{equation} f_Y(y) = f_X(h(y))\ h'(y). \end{equation}
However, we have implicitly assumed that $g$ is an increasing function. To be robust to decreasing functions, we must ensure the sign of the derivative is positive:
\begin{equation} f_Y(y) = f_X(h(y))\ |h'(y)| \end{equation}
With $h(y) = g^{-1}(y)$, we know [2] that
\begin{equation} h'(y) = \frac{1}{g'(h(y))}. \end{equation}
Therefore
\begin{equation} f_Y(y) = f_X(h(y))\ \big|\big[g'(h(y))\big]^{-1}\big|, \end{equation}
or equivalently, since $h(y) = x$,
\begin{equation} f_Y(y) = f_X(x)\ \big|\big[g'(x)\big]^{-1}\big|. \end{equation}
In the multivariate case, $g'(x)$ gets swapped for $\det(\mathbf{J_g})$. This gives the result stated in the paper.
[1] https://en.wikipedia.org/wiki/Integration_by_substitution
[2] https://en.wikipedia.org/wiki/Inverse_functions_and_differentiation