I've been working through Murphy's Machine Learning A Probabilistic Perspective and have had a slight issue with the section on the Kalman Filter.
As a setup we're assuming a linear-Gaussian state space model with:
- $\mathbf{z}_t$ is the hidden state, $\mathbf{u}_t$ is an optional input/control signal, $\mathbf{y}_t$ is the observation.
- transition model: $\mathbf{z}_{t}=\mathbf{A}_{t} \mathbf{z}_{t-1}+\mathbf{B}_{t} \mathbf{u}_{t}+\boldsymbol{\epsilon}_{t}$
- observation model $\mathbf{y}_{t}=\mathbf{C}_{t} \mathbf{z}_{t}+\mathbf{D}_{t} \mathbf{u}_{t}+\boldsymbol{\delta}_{t}$
with system and observation noise respectively $\epsilon_{t} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{Q}_{t}\right)$, $\delta_{t} \sim \mathcal{N}\left(\mathbf{0}, \mathbf{R}_{t}\right)$.
Now for the introduction of the Kalman filter algorithm we represent the marginal posterior at time $t$ by: $$p\left(\mathbf{z}_{t} \mid \mathbf{y}_{1: t}, \mathbf{u}_{1: t}\right)=\mathcal{N}\left(\mathbf{z}_{t} \mid \boldsymbol{\mu}_{t}, \boldsymbol{\Sigma}_{t}\right)$$
Then he writes the following for the prediction step:
I don't see how he got to the first equality. This is how I tried to do it:
We have that
$$ \begin{align} p\left(\mathbf{z}_{t} \mid \mathbf{y}_{1: t-1}, \mathbf{u}_{1: t}\right) &= \int p(\mathbf{z}_t,\mathbf{z}_{t-1}|\mathbf{y}_{1:t-1},\mathbf{u}_{1:t}) \; d\mathbf{z}_{t-1} \\ &= \int p(\mathbf{z}_t|\mathbf{z}_{t-1}, \mathbf{y}_{1:t-1},\mathbf{u}_{1:t})\; p(\mathbf{z}_{t-1}|\mathbf{y}_{1:t-1},\mathbf{u}_{1:t}) \; d \mathbf{z}_{t-1}\\ &= \int p(\mathbf{z}_t|\mathbf{z}_{t-1},\mathbf{u}_{1:t})\; p(\mathbf{z}_{t-1}|\mathbf{y}_{1:t-1},\mathbf{u}_{1:t})\;d\mathbf{z}_{t-1} \end{align}$$
where because of the conditional Markov property we can get rid of the dependence in the first probability on $\mathbf{y}_{1:1-t}$ and you get the first normal distribution on the RHS of equation 18.25. The problem I have is I don't see how the second probability becomes what he has since $$p\left(\mathbf{z}_{t-1} \mid \mathbf{y}_{1: t-1}, \mathbf{u}_{1: t-1}\right)=\mathcal{N}\left(\mathbf{z}_{t-1} \mid \boldsymbol{\mu}_{t-1}, \boldsymbol{\Sigma}_{t-1}\right)$$ and so we're not conditioning on $\mathbf{u}_t$ there. Is there something I'm missing here? I can't work out where the conditional on $\mathbf{u}_{t}$ has gone. Intuitively it seems that $\mathbf{z}_{t-1}$ should be dependent on $\mathbf{u}_{t}$ since surely from $\mathbf{u}_t$ we can infer what potential states $\mathbf{z}_t$ could have been and then from that infer what potential state $\mathbf{z}_{t-1}$ could have been. Though it's a bit confusing and handy-wavy. I've been jumping between sections of the book so it's probably something obvious I've missed.
