Understanding the Basic Mathematics of a Kalman Filter

437 Views Asked by At

In Introduction to Linear Algebra, Gilbert Strang, it says that for a Kalman filter $$ \hat{x}_1=\hat{x}_0+K_1(b_1-A_1\hat{x}_0) $$ where the Kalman gain matrix $K_1=W_1A_1^TV_1^{-1}$ and covariance of errors in $\hat{x}_1$ is $W_1^{-1}=W_0^{-1}+A_1^{T}V_1^{-1}A_1$.

I think I understand what Kalman filter is trying to do, when new data is coming instead of computing the least square solution for the whole data we are trying to make use of the least square solution for old data to find that for the whole data. And I think here we are assuming the data error has a normal distribution and that is why the covariance matrix comes into picture.

I also understand weighted least square solution for $Ax=b$ is $\hat{x}=(A^TV^{-1}A)^{-1}A^TV^{-1}b$, and how we arrive at this expression.

But how do we arrive at the term $K_1(b_1-A_1\hat{x}_0)$ ?

I am having difficulty following the mathematical form of the equation as it is ?

Reference: Page 560,Chapter 12-Linear Algebra in Probability & Statistics, Introdcution to Linear Algebra, Gilbert Strang

link 1

link 2

1

There are 1 best solutions below

0
On

A quick note before answering your question.

The Kalman filter (KF) is most often used if the parameters being estimated are changing in time (i.e., dynamic parameters); however, the KF can also be used if the parameters being estimated are not changing in time (i.e., static parameters). In such cases, the KF is equivalent to weighted least squares. For example, if you look at the KF equations as described by Wikipedia, the equations look different; however, I can ensure you that the equations are identical with some algebra (except for the value covariance matrix used in the gain, which I'll mention in a moment as I suspect this is a typo in the reference, but I am not completely sure.).

Answer:

The estimate $\hat{\mathbf{x}}_0$ given $\mathbf{b}_0$ is given by (15) as follows: $$ \hat{\mathbf{x}}_0 = (A_0^T V_0^{-1} A_0)^{-1} A_0^T V_0^{-1} \mathbf{b}_0 $$

If we expand (16), then we have the following: $$ \begin{align} A_0^T V_0^{-1} A_0 \hat{\mathbf{x}}_1 + A_1^T V_1^{-1} A_1 \hat{\mathbf{x}}_1 = A_0^T V_0^{-1} \mathbf{b}_0 + A_1^T V_1^{-1} \mathbf{b}_1, \end{align} $$ and simplifying and rearranging, we have $$ \begin{align} A_0^T V_0^{-1} A_0 \hat{\mathbf{x}}_1 &= A_0^T V_0^{-1} \mathbf{b}_0 + A_1^T V_1^{-1} \mathbf{b}_1 - A_1^T V_1^{-1} A_1 \hat{\mathbf{x}}_1 \\ &= A_0^T V_0^{-1} \mathbf{b}_0 + A_1^T V_1^{-1} (\mathbf{b}_1 - A_1 \hat{\mathbf{x}}_1) \end{align} $$ then, left multiplying by $(A_0^T V_0^{-1} A_0)^{-1}$ gives the following: $$ \hat{\mathbf{x}}_1 = \underbrace{(A_0^T V_0^{-1} A_0)^{-1}A_0^T V_0^{-1} \mathbf{b}_0}_{\hat{\mathbf{x}}_0} + \underbrace{(A_0^T V_0^{-1} A_0)^{-1} A_1^T V_1^{-1}}_{K_1} (\mathbf{b}_1 - A_1 \hat{\mathbf{x}}_1) $$

However, notice in the equation above that $K_1 = W_0 A_1^T V_1^{-1}$ instead of $K_1 = W_1 A_1^T V_1^{-1}$, which is the equation provided in the reference. In the KF (assuming static parameters), the updated covariance is given as follows: $$ W_k^{-1} = W_{k-1}^{-1} + A_k^T V_k^{-1} A_k, $$ which is consistent with the reference. However, the covariance matrix that should be used in the Kalman gain is the prior covariance matrix, but in the reference, the posterior covariance matrix is used instead. The posterior covariance matrix represents the uncertainty in the estimate after performing the update, so the posterior covariance should be not used in the gain. I believe this is a typo in the reference.