Accuracy of importance sampling estimates

59 Views Asked by At

I'm reading Radford M. Neal's paper Annealed Importance Sampling here .

In Section $3$, titled Accuracy of importance sampling estimates, page $7$, there is an equality I do not understand. Supposing we have a random variable $x$, a function $a$ of $x$, where $x$ has density proportional to $f(x)$. Sampling from another distribution, the importance sampling estimate $\bar{a}$ for $\mathbb{E}_f[a]$, based on points $x^{(i)}$ drawn independently from the density proportional to $g(x)$ is given by $$ \bar{a} = \sum_{i=1}^{N} w^{(i)} a(x^{(i)}) / \sum_{i=1}^{N} w^{(i)} = N^{-1} \sum_{i=1}^{N} w^{(i)} a(x^{(i)}) / N^{-1}\sum_{i=1}^{N} w^{(i)}$$
where $w^{(i)} = \frac{f(x^{(i)})}{g(x^{(i)})}$ are the importance weights.

Summarizing for conciseness, an estimate, equation $15$, by $$ Var_g(\bar{a}) \approx N^{-1} \mathbb E_g [(w^{(i)}(a(x^{(i)}-\mathbb{E}_f(a))^2]/\mathbb{E}[w^{(i)})]^2$$ The author then writes

When $w^{(i)}$ and $a(x^{(i)})$ are independent under $g$, equation $15$ (the above), simplifies to $$ Var_g(\bar{a}) \approx N^{-1} \mathbb E_g [(w^{(i)})^{2}] \mathbb{E}_g[(a(x^{(i)}-\mathbb{E}_f(a))^2]/\mathbb{E}_g[w^{(i)})]^2$$ $$ = N^{-1} [1+ Var_g[w^{(i)}/\mathbb{E}_g(w^{(i)})]]Var_f[a(x^{(i)})]$$

where the last step uses the following: $$Var_f[a(x^{(i)}] = \mathbb{E}_f[(a(x^{(i)})-\mathbb{E}_f(a))^2] = \mathbb{E}_g[w^{(i)}(a(x^{(i)})-\mathbb{E}_f(a))^2]/\mathbb{E}_g[w^{(i)}]$$ $$ = \mathbb{E}_g[(a(x^{(i)})-\mathbb{E}_f(a))^2]$$

This last equation is fine, but I'm confused by the $[1+ Var_g[w^{(i)}/\mathbb{E}_g(w^{(i)})]$ term, which must equal $\mathbb E_g [(w^{(i)})^2] /\mathbb{E}_g[w^{(i)})]^2$ (I think).

Examining the latter, we have

$$\mathbb E_g [(w^{(i)})^{2}] /\mathbb{E}_g[w^{(i)}]^2 = \frac{\sum_{x \in X} \frac{f(x)^2}{g(x)^2} g(x)}{(\sum_{x \in X}\frac{f(x)}{g(x)}g(x))^2}$$

It's not obvious for me how to proceed, any insights appreciated.

1

There are 1 best solutions below

0
On BEST ANSWER

I'll write $w^{(i)}=w$ to make the notation easier to read, and $\mathbb{E}_g = \mathbb{E}$, then as you suggest the term is:

\begin{align*} \frac{\mathbb{E} (w^2)}{[\mathbb{E}(w)]^2} &= \frac{\mathbb{E} (w^2)-[\mathbb{E}(w)]^2 + [\mathbb{E}(w)]^2}{[\mathbb{E}(w)]^2}\\ &= \frac{\text{Var}(w) + [\mathbb{E}(w)]^2}{[\mathbb{E}(w)]^2}\\ &= \frac{\text{Var}(w)}{[\mathbb{E}(w)]^2}+1\\ &= \text{Var} \left ( \frac{w}{\mathbb{E}(w)}\right )+1 \end{align*} since $\mathbb{E}(w)$ is a constant.