Why dynamical systems must be affine with respect to noise in order to apply kalman filtering

118 Views Asked by At

I know that to apply Kalman Filtering for state estimation purposes one first assumption is that we can model the system as: $$x(k+1) = F(k)x(k) + G(k)u(k) + v(k) $$ $$y(k) = H(k)x(k) + w(k) $$

I am trying to find a nice/simple explanation of why it is important that the $v(k)$ and $w(k)$ appear in the previous equations in an affine form (i.e., in a linear way). This is an assumption that all the books of state estimation do when first explain the basics of Kalman Filtering. However, none of the books explain why this is so important. What would it happen if the noise would appear in a non-affine way. E.g.: $$x(k+1) = F(k)x(k) + G(k)u(k)v(k) $$ $$y(k) = H(k)x(k)w(k) $$

I know that it is still possible to use kalman filters but the equation become more involved since the relationship between the noise and the state/input of the system is nonlinear. However, where/how can I explain in simple terms why this (often hidden) assumption is so important?

The equations above are applied to a discrete linear system but the same goes for nonlinear systems and when therefore a (more complex) Extended Kalman Filter (EKF) must be applied. Indeed, the same affinity assumption is made:

$$x(k+1) = f(x(k),u(k),k) + v(k) $$ $$y(k) = h(x(k),k) + w(k) $$

Thanks in advance

1

There are 1 best solutions below

0
On

I don't have a background in stochastic processes, so I am not completely sure if time varying noise covariances could maybe pose some nuances in the optimal estimation result. However, in Kumar, P. R., and Pravin Varaiya. "Stochastic systems: estimation, identification and adaptive control." (1986), page 93 defined the general state space model considered by a Kalman filter as

\begin{align} x_{k+1} &= A_k x_k + B_k u_k + G_k w_k, \\ y_k &= C_k x_k + H_k v_k, \end{align}

with $A_k$, $B_k$, $G_k$, $C_k$ and $H_k$ possibly time-varying, known matrices of appropriate dimensions. The variables $w_k$ and $v_k$ are zero mean possibly multivariate normally distributed with covariances $Q$ and $R$ respectively. These covariances are not time-varying. This is equivalent to using $w_k'$ and $v_k'$ instead of $G_k w_k$ and $H_k v_k$ respectively, where $w_k'$ and $v_k'$ have time-varying covariances $G_k Q\,G_k^\top$ and $H_k R\,H_k^\top$ respectively.

For the normal Kalman filter result it is required that the model is linear. So terms containing $x_k\,w_k$ would not be allowed. However, $G_k u_k v_k$ could also be interpreted as using $G_k'=G_k u_k$ as the time-varying matrix multiplied by $v_k$, since $u_k$ is known.

For the extended Kalman filter the matrices $A_k$, $B_k$, $G_k$, $C_k$ and $H_k$ can be estimated from the dynamics

\begin{align} x_{k+1} &= f(k,x_k,u_k,w_k), \\ y_k &= h(k,x_k,v_k), \end{align}

by taking the appropriate partial derivatives and substituting in the current (estimated) values (for the stochastic variables $w_k$ and $v_k$ I think one should use their expected value, so zero, when evaluating these matrices).