How are Importance Sampling and likelihood calculation done in Particle Filters (SIR)

43 Views Asked by At

In the section on Sequential Importance Sampling of the book Bayesian Filtering and Smoothing by Simo Sarkka, the author states that for each step we

  1. draw samples from the importance distribution

$$ x_k^{(i)} \sim \pi(x_k|x_{k-1}^{(i)}, y_{1:k}) $$

  1. calculate new particle weights as per

$$ w_k^{(i)} \propto w_{k-1}^{(i)} \frac{p(y_k|x_k^{(i)}) p(x_k^{(i)}|x_{k-1}^{(i)})}{\pi(x_k^{(i)}|x_{k-1}^{(i)}, y_{1:k})} $$ and normalize them.

My questions are:

  1. What does it mean for $\pi(x_k)$ to be conditional on $x_{k-1}^{(i)}, y_{1:k}$, i.e. why are we sampling from $\pi(x_k|x_{k-1}^{(i)}, y_{1:k})$ and not just from $\pi(x_k)$? How do we sample from that conditional distribution?
  2. How do we choose good importance distribution and how important is it to choose good importance distribution? How often and what kind of consequences will we have if we just always use uniform, normal or t distribution as importance distribution?
  3. What does it mean and how do we empirically sample from $p(y_k|x_k^{(i)})$ and $p(x_k^{(i)}|x_{k-1}^{(i)})$?
  4. Why are we multiplying the density ratio in weights calculation with the previous weights $(w_{k-1}^{(i)})$?
1

There are 1 best solutions below

0
On BEST ANSWER
  1. You are updating the particles based on your previous particles $\{x_{k-1}^{(i)}\}_{i = 1}^N$ and the data that you've observed so far $y_{1:k}$. If you just sample from $\pi(x_k)$ you won't be taking any of the data into account, or your previous inference. You choose $\pi(x_k \mid x_{k-1}, y_{1:k})$ yourself to be a distribution that you can sample from.
  2. A good importance distribution will depend on the context, but in general you want it to be as close as possible to the true distribution $p(x_k \mid y_{1:k})$. I can't give a generic answer about the suitability of different choices because it completely depends on the context. For example, I've seen particle filtering used before for change point detection, and in that context it would have to be a different distribution.
  3. You don't need to sample from $p(y_k \mid x_k^{(i)})$ and $p(x_k^{(i)} \mid x_{k-1}^{(i)})$, you just need to be able to calculate them (or something proportional to them) in order to calculate the ratio for $w_k$. These are the densities of $y_k \mid x_k$ and $x_k \mid x_{k-1}$ and will depend on the context.
  4. The formula for $w_k^{(i)}$ is an updating step. Each particle $x_{k-1}^{(i)}$ has an associated weight $w_{k-1}^{(i)}$. When you update $x_{k-1}^{(i)}$ to $x_{k}^{(i)}$ you update $w_{k-1}^{(i)}$ to $w_{k}^{(i)}$ by multiplying it by the ratio $$\frac{p(y_k|x_k^{(i)}) p(x_k^{(i)}|x_{k-1}^{(i)})}{\pi(x_k^{(i)}|x_{k-1}^{(i)}, y_{1:k})}.$$