Unbiased estimate of log-likelihood of Markov bridge

33 Views Asked by At

Note: Cross-post from CrossValidated.

I have the following problem I am trying to solve. I have a parametric family of "transition" distributions $p_\theta(x_{i+1}\mid x_i)$ and I am given a starting point $x_0$ and an end point $x_n$. I denote by $p_\theta(x_n\mid x_0)$ the probability of reaching $x_n$ starting from $x_0$ and performing $n$ steps, where this process is assumed to be Markovian. What I need is an unbiased estimate of $\partial_\theta\log p_\theta(x_n\mid x_0)$ (think e.g. for some SGD process).

My approach is as follows. First we have \begin{align} \partial_\theta\log p_\theta(x_n\mid x_0)=\frac{\partial_\theta p_\theta(x_n\mid x_0)}{p_\theta(x_n\mid x_0)}. \end{align} The numerator is given by \begin{align} \partial_\theta p_\theta(x_n\mid x_0)={}&\partial_\theta\int dx_1\cdots dx_{n-1}\prod_{i=1}^np_\theta(x_i\mid x_{i-1})\\ ={}&\int dx_1\cdots dx_{n-1}\sum_{j=1}^n\big(\partial_\theta p_\theta(x_j\mid x_{j-1})\big)\prod_{i\neq j}p_\theta(x_i\mid x_{i-1})\\ ={}&\int dx_1\cdots dx_{n-1}\left(\sum_{j=1}^n\partial_\theta \log p_\theta(x_j\mid x_{j-1})\right)\prod_{i=1}^np_\theta(x_i\mid x_{i-1}). \end{align} The denominator is $$p_\theta(x_n\mid x_0)=\int dx_1\cdots dx_{n-1}\prod_{i=1}^np_\theta(x_i\mid x_{i-1}),$$ which acts to normalize $\tau(x_{0:n}):=\prod_{i=1}^np_\theta(x_i\mid x_{i-1})$ to a probability density over "paths" from $x_0$ to $x_n$. Thus, I can rewrite $$\partial_\theta\log p_\theta(x_n\mid x_0)=\mathbb{E}_{x_{0:n}\sim\tau}\left[\sum_{j=1}^n\partial_\theta \log p_\theta(x_j\mid x_{j-1})\right].$$ Therefore, the only thing left to do is drawing one or more samples from $\tau$. This could be done by some kind of Metropolis-Hastings sampling, but I am hoping in something faster to compute. My idea would be to draw a certain number $L$ of sequences $x_1,\ldots,x_{n-1}$ from $p_{\theta}(x_{i-1}\mid x_i)$ starting from $x_0$ and then weight them proportionally to $p_\theta(x_n\mid x_{n-1})$. Asymptotically, this should give the correct distribution (at least intuitively, I haven't checked all details yet).

Does this approach give a good result (for small-ish $L$)? I expect this to be biased in some way (e.g. for $L=1$ we would not be drawing from the correct distribution), is this the case? Are there better approaches I could use to get the unbiased sample I need?

1

There are 1 best solutions below

0
On

Yes, it is valid, provided that you divide by the correct normalising function.

I'll write $x=(x_1,...,x_{n-1})$ and: $$ f(x) = \sum_{j=1}^n\partial_\theta \log p_\theta(x_j|x_{j-1}) $$ You want to calculate: $$ \partial_\theta \log p_\theta(x_n|x_0) = \int f(x) p(x)d^{n-1}x $$ with: $$ p(x) = \frac{\prod_{j=1}^n p_\theta(x_j|x_{j-1})}{p_\theta(x_n|x_0)} $$ As you've said yourself, the natural method is to do direct sample directly from $p$ and interpret it as an expected value: $$ \partial_\theta \log p_\theta(x_n|x_0) = \mathbb E_p[f(x)] $$ The sampling can be done using Metropolis Hastings for example. Physically, this is a 1D Ising model where you fix the extremal spins, if the $x_j$ are binary.

Your approach is just to change the sampling distribution to: $$ q(x) = \prod_{j=1}^{n-1} p_\theta(x_j|x_{j-1}) $$ Using the fact that: $$ p(x) = q(x)\frac{p_\theta(x_n|x_{n-1})}{p_\theta(x_n|x_0)} $$ you can similarly recast the expression as an expected value: $$ \partial_\theta p_\theta(x_n|x_0) = \frac{\mathbb E_q\left[f(x)p_\theta(x_n|x_{n-1})\right]}{p_\theta(x_n|x_0)} $$ This is almost what you described, only you forgot the denominator. In the previous method, this was automatically dealt with during the sampling, but now you'll need to calculate it. The standard method is to use the same distribution: $$ p_\theta(x_n|x_0) = \int q(x)p_\theta(x_{n-1}|x_0)d^{n-1}x $$ so you can interpret it as an expected value: $$ p_\theta(x_n|x_0) = \mathbb E_q\left[p_\theta(x_n|x_{n-1})\right] $$ so the exact formula to simulate is: $$ \partial_\theta \log p_\theta(x_n|x_0) = \frac{\mathbb E_q\left[f(x)p_\theta(x_n|x_{n-1})\right]}{\mathbb E_q\left[p_\theta(x_n|x_{n-1})\right]} $$ From the law of large numbers, you'll get the correct result for large sample size $L$. Furthermore, the central limit theorem allows you quantify the asymptotic fluctuations. Note that you don't need to double the samplings for the numerator and the denominator: just with one trajectory you can calculate a sample of the numerator and the denominator. You'll just need to be careful to take the ratio of the empirical averages and not the empirical average of the ratios.

Hope this helps.