How to push back stochastic term in computational graph?

21 Views Asked by At

Assume the following (variatonal auto encoder) model.

$$\begin{align} h_i=&\;g_{\lambda}(x_i)\\ z_i \sim &\; N(h_i,I_L)\\ \tilde x_i =&\; f_\psi(z_i)\\ \mathcal{L}=&\;||x_i - \tilde x_i||_2^2 \end{align}$$

If we wanted to optimize the parameters of $f$ and $g$ with SGD we would compute the gradient of the loss $\mathcal{L}$ w.r.t $\lambda$ and $\psi$. However, thinking of the backpropagation in terms of a computational graph, we would have to "push back" the stochastic term $z_i$ since we can't compute the gradient if there is such a term in between, i.e, we want $z_i$ to be deterministic once we know it: $z_i'= h_i + \epsilon_i$ where $\epsilon_i \sim N(0_L,I_L)$. This formulation is identical to my previous model definition and the stochastic term is not dependent on anything anymore. Now, for a normal, the transformation is clear. We can nicely scale the mean and variance. My question is, how would we go about this if we had an exponentional, e.g., $z_i \sim Exp(h_i)$. Is there a way to perform a similar transformation? How so for a uniform etc.? Is there a general recipe?

(I couldn't come up with a good question name, so if someone can think of something more appropriate, please edit. Also, I wasn't sure whether to ask this on stats or math, but I figured math would be better because I think I am simply looking for a transformation in distributions.)