My questions are about the paper: Semi-supervised Learning with Deep Generative Models (Kingma, D.P. et al, 2014).
Suppose I have a generative network with input $x$, and a hidden layer with hidden nodes $z$. We use $z$ to generate $x$.
Now suppose my prior distribution on the hidden units is a spherical Gaussian:
$p(z) \sim N(z \mid 0, I)$
And I want to obtain a posterior distribution $q_\phi (z \mid x) = N(z \mid \mu_\phi(x), {\rm{diag}}(\sigma_{\phi}^2 (x)))$.
I parameterize a multi-layer perceptron (MLP) and use that MLP to learn $\mu_{\phi}(x)$ and $\sigma_{\phi}(x)$.
My questions are:
1) How do the parameters $\phi$ of the MLP determine the mean and variance?
2) How does that work mathematically? How am I getting the posterior mean and variance from this MLP?
3) Why do I need an MLP at all?