Computation of the estimator of the ELBO in Variational Auto-encoder

174 Views Asked by At

I am reading the paper below by Kingma et.al.

https://arxiv.org/pdf/1906.02691.pdf

Section 2.4.4 which is titled as "Computation of $\log q_{\phi}(z|x)$":

In Eqn. (2.33) the authors explain a relation between the densities of $\epsilon$ and $z$ as follows:

\begin{equation} \log q_{\phi}(z|x) = \log p(\epsilon) − \log d_{\phi}(x, \epsilon) \end{equation}

How is the second term of the RHS derived?

Let me explain my question with more details.

Applying Bayes rule and joint probability definition on $\log q_{\phi}(z|x)$ implies

$\log \frac{q_{\phi}(z, x)}{q_{\phi}(x)} = \log \frac{q_{\phi}(x|z) q_{\phi}(z)}{q_{\phi}(x)}$

applying log yields

$\log q_{\phi}(z|x) = \log q_{\phi}(x|z) + \log q_{\phi}(z) - \log q_{\phi}(x)$

Replacing $\log q_{\phi}(z)$ with $\log p(\epsilon)$ (because of reparametriztion) and rearranging leads to

$\log q_{\phi}(z|x) = \log p(\epsilon) + \log q_{\phi}(x|z) - \log q_{\phi}(x)$

However, Eqn. 2.33 in the paper is:

$\log q_{\phi}(z|x) = \log p(\epsilon) - \log d_{\phi}(x, \epsilon)$.

I know this is a variable change but I have two questions:

1) Am I right with what I explained above?

2) I can't understand the sentence which is mentioned after it. That is, "where the second term is the log of the absolute value of the determinant of the Jacobian matrix $(∂z/∂\epsilon)$"

1

There are 1 best solutions below

0
On

I was also confused about this, but as I read through the tutorial link which I put there in the comment, I think I have understood it. I want to share my notes here.

Your derivation is right, but it has no way leading to the desired form.

The above equation is a result from formula: $$ \tag{1} \newcommand{\d}{\mathrm{d}} p(y) = p(f^{-1}(y)) \cdot \left| \frac{\d f^{-1}(y)}{\d y} \right| $$ more specifically, in its more general form (due to implicit function theorem from Manifolds, relevant post here found by Approach0): $$ \tag{2} p(y) = p(f^{-1}(y)) \cdot \left| \det \frac{\partial f^{-1}(y)}{\partial y} \right| = p(f^{-1}(y)) \cdot \left| \det \frac{\partial y}{\partial f^{-1}(y)} \right|^{-1} $$

To see this, you can think of $q_\phi(z|x)$ as $f(\epsilon; \phi, x)$ where $x, \phi$ are given and the random variable $z$ only depends on $\epsilon$. Then the desired formula $$ \log q_{\phi}(z|x) = \log p(\epsilon) − \log d_{\phi}(x, \epsilon) = \log p(\epsilon) + \log \left| \det \frac{\partial \epsilon}{ \partial z} \right| $$ is essentially applying equation (2) $$ q_{\phi}(z|x) = p(\epsilon) \cdot \left| \det \frac{\partial z}{\partial \epsilon} \right|^{-1} $$ (notice $f^{-1}(z|x) = \epsilon$)

Now, to see why equation (1) holds: $$ F_Y(y) = p(Y \le y) = p(f(x) \le y)=p(x \le f^{-1}(y)) = F_X(f^{-1}(y)) $$

thus $$ F_Y(y)=F_X(f^{-1}(y)) = \left\{ \begin{array}{ll} \int_{-\infty}^{f^{-1}(y)} \; p(x) \; \d x && \text{if f(x) is increasing} \\ &\\ \int_{-\infty}^{-f^{-1}(y)} \; p(x) \; \d x && \text{if f(x) is decreasing}\\ \end{array} \right. $$

According to Fundamental Theorem of Calculus and the chain rule, you get equation (1).