Derivation of the Kalman filter prediction step

105 Views Asked by At

I've been working through Murphy's Machine Learning A Probabilistic Perspective and have had a slight issue with the section on the Kalman Filter.

As a setup we're assuming a linear-Gaussian state space model with:

  • $\mathbf{z}_t$ is the hidden state, $\mathbf{u}_t$ is an optional input/control signal, $\mathbf{y}_t$ is the observation.
  • transition model: $\mathbf{z}_{t}=\mathbf{A}_{t} \mathbf{z}_{t-1}+\mathbf{B}_{t} \mathbf{u}_{t}+\boldsymbol{\epsilon}_{t}$
  • observation model $\mathbf{y}_{t}=\mathbf{C}_{t} \mathbf{z}_{t}+\mathbf{D}_{t} \mathbf{u}_{t}+\boldsymbol{\delta}_{t}$

with system and observation noise respectively $\epsilon_{t} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{Q}_{t}\right)$, $\delta_{t} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{R}_{t}\right)$.


Now for the introduction of the Kalman filter algorithm we represent the marginal posterior at time $t$ by: $$p\left(\mathbf{z}_{t} \mid \mathbf{y}_{1: t}, \mathbf{u}_{1: t}\right)=\mathcal{N}\left(\mathbf{z}_{t} \mid \boldsymbol{\mu}_{t}, \boldsymbol{\Sigma}_{t}\right)$$

Then he writes the following for the prediction step:

enter image description here

I don't see how he got to the first equality. This is how I tried to do it:

We have that

$$ \begin{align} p\left(\mathbf{z}_{t} \mid \mathbf{y}_{1: t-1}, \mathbf{u}_{1: t}\right) &= \int p(\mathbf{z}_t,\mathbf{z}_{t-1}|\mathbf{y}_{1:t-1},\mathbf{u}_{1:t}) \; d\mathbf{z}_{t-1} \\ &= \int p(\mathbf{z}_t|\mathbf{z}_{t-1}, \mathbf{y}_{1:t-1},\mathbf{u}_{1:t})\; p(\mathbf{z}_{t-1}|\mathbf{y}_{1:t-1},\mathbf{u}_{1:t}) \; d \mathbf{z}_{t-1}\\ &= \int p(\mathbf{z}_t|\mathbf{z}_{t-1},\mathbf{u}_{1:t})\; p(\mathbf{z}_{t-1}|\mathbf{y}_{1:t-1},\mathbf{u}_{1:t})\;d\mathbf{z}_{t-1} \end{align}$$

where because of the conditional Markov property we can get rid of the dependence in the first probability on $\mathbf{y}_{1:1-t}$ and you get the first normal distribution on the RHS of equation 18.25. The problem I have is I don't see how the second probability becomes what he has since $$p\left(\mathbf{z}_{t-1} \mid \mathbf{y}_{1: t-1}, \mathbf{u}_{1: t-1}\right)=\mathcal{N}\left(\mathbf{z}_{t-1} \mid \boldsymbol{\mu}_{t-1}, \boldsymbol{\Sigma}_{t-1}\right)$$ and so we're not conditioning on $\mathbf{u}_t$ there. Is there something I'm missing here? I can't work out where the conditional on $\mathbf{u}_{t}$ has gone. Intuitively it seems that $\mathbf{z}_{t-1}$ should be dependent on $\mathbf{u}_{t}$ since surely from $\mathbf{u}_t$ we can infer what potential states $\mathbf{z}_t$ could have been and then from that infer what potential state $\mathbf{z}_{t-1}$ could have been. Though it's a bit confusing and handy-wavy. I've been jumping between sections of the book so it's probably something obvious I've missed.