In the section on Sequential Importance Sampling of the book Bayesian Filtering and Smoothing by Simo Sarkka, the author states that for each step we
- draw samples from the importance distribution
$$ x_k^{(i)} \sim \pi(x_k|x_{k-1}^{(i)}, y_{1:k}) $$
- calculate new particle weights as per
$$ w_k^{(i)} \propto w_{k-1}^{(i)} \frac{p(y_k|x_k^{(i)}) p(x_k^{(i)}|x_{k-1}^{(i)})}{\pi(x_k^{(i)}|x_{k-1}^{(i)}, y_{1:k})} $$ and normalize them.
My questions are:
- What does it mean for $\pi(x_k)$ to be conditional on $x_{k-1}^{(i)}, y_{1:k}$, i.e. why are we sampling from $\pi(x_k|x_{k-1}^{(i)}, y_{1:k})$ and not just from $\pi(x_k)$? How do we sample from that conditional distribution?
- How do we choose good importance distribution and how important is it to choose good importance distribution? How often and what kind of consequences will we have if we just always use uniform, normal or t distribution as importance distribution?
- What does it mean and how do we empirically sample from $p(y_k|x_k^{(i)})$ and $p(x_k^{(i)}|x_{k-1}^{(i)})$?
- Why are we multiplying the density ratio in weights calculation with the previous weights $(w_{k-1}^{(i)})$?