2nd order EKF covariance propagation equation (hessian)

57 Views Asked by At

I am trying to implement a 2nd-order EKF and am having some issues with the propagation equation for covariance.

From the literature, if $X$ has covariance $P$ and the propagation function $f$ has jacobian $J$ and hessians $H^1$, $H^2$, ... the propagation / prediction of $X_{k+1}$ obeys

$$X_{k+1}=f(X_k) + \frac{1}{2} \sum_i e_i \cdot trace(H^i_kP_k)$$

and, assuming a simple no noise case, the covariance update equation is supposed to be

$$P_{k+1}=JP_kJ^T + \frac{1}{2} \sum_i \sum_j e_{i,j} \cdot trace(H^i_kP_kH^j_kP_k)$$

With $e_i$ unit vector (equal to $1$ at coordinate $i$ and null on all other coordinates) and $e_{i,j}$ "unit matrix" (null on all coordinates except at $(i,j)$ where it is equal to $1$).

Let's put that to a simple test, shall we ?

  • Let $X=[x;y]$ a gaussian vector, with initial expectation $X_0=[0;0]$, and initial covariance

$$P_0=\begin{bmatrix}0 & 0 \\ 0 & p_y\end{bmatrix}$$

  • Let the system propagation equation be $\dot x = -y^2/2$ and $\dot y=0$.

Note the system is quite simple and purely quadratic (there are no higher order terms).

First, let's solve the system analytically for any given $t$.

Trivially, for a given initial value $y(0)=y_0$, we have $x(t)=-y_0^2t/2$ and $y(t)=y_0$. Statistically,

  • the expectation at time $t$ is thus

$$X(t) = E\left(\begin{bmatrix}-y_0^2t/2 \\ y_0 \end{bmatrix}\right)=\begin{bmatrix}-p_yt/2 \\ 0\end{bmatrix} $$

  • the covariance at time $t$ is, (since $var(y^2)=2p_y^2$, we have $var(-(y^2/2) t)=(p_y t)^2/2$ and thus)

$$P(t)=\begin{bmatrix} (p_y t)^2/2 & 0 \\ 0 & p_y \end{bmatrix}$$

Simple.

Now let's try to apply the former discrete EKF propagation equations.

Consider a discrete propagation of the state, with timestep $dt$, and $$f:X \longmapsto X+\begin{bmatrix}-X(2)^2/2\\0\end{bmatrix}dt$$

Note this is NOT an approximate discretization of the continuous system, this is indeed exactly true (there is no discretization error on $f$ wrt to the continuous system).

We have for the jacobian and the two hessians of $f$, around any $X$

$$J(X)=\begin{bmatrix} 1 & -X(2)dt \\ 0 & 1 \end{bmatrix} ; H^1(X)=H^1=\begin{bmatrix}0 & 0 \\0 & -dt\end{bmatrix} ; H^2(X)=H^2=0_2$$

Once again, note the system is purely quadratic : $H^1$ and $H^2$ do NOT depend on $X$ (they are even constant here).

With the given system and initial conditions on $X_0$ and $P_0$, we can trivially deduce that :

  • $X_k(2)=0$ for all $k$ and $P_k(2,2)=p_y$ for all $k$ : all correct / matches the actual solution
  • from the former $X_k(2)=0$, that $J_k(X_k)=I_2$ for all $k$ : also correct
  • from the former, that $trace(H^1P_k)=-P_k(2,2)dt=-p_y dt$ : also correct

And finally for the full state prediction,

$$X_{N=t/dt} = \begin{bmatrix}-p_yt/2 \\ 0\end{bmatrix}$$

Which is, once again, correct (equal to the analytical solution for $X(t)$), no matter the chosen timestep. Great !

For the covariance, we have already established $P_k(2,2)=p_y$ (which is correct) ; since $J=I_d$ and $H^2=0$, we also trivially obtain $P_k(1,2)=P_k(2,1)=0$ which is also correct.

The issue lies with $P_k(1,1)$. We will have at each timestep

$$\begin{align}P_{k+1}(1,1)&=P_{k}(1,1)+\frac{1}{2}trace(H^1P_kH^1P_k)\\&=P_{k}(1,1)+\frac{1}{2}(P_{k}(2,2) dt)^2\\&=P_{k}(1,1)+\frac{1}{2}(p_y dt)^2\end{align}$$

and thus after $N$ timesteps, at time $t=Ndt$

$$P_N(1,1)=\frac{Ndt}{2}p_y^2 dt=\frac{t}{2}p_y^2 dt$$

Which is obviously wrong / problematic since, for a given / fixed finishing time $t$, the result will linearly scale with the arbitrarily chosen timestep $dt$, and is indeed only valid if $dt=t$, that is if there is only 1 timestep.

What is wrong here ?

The underlying issue is that by summing at each timestep the additional covariances induced by the 2nd order terms, we are in fact treating these additional covariances as independent from one another.

That is, the additional induced covariance from $k$ to $k+1$ is treated / considered independant from the additional induced covariance from $k+1$ to $k+2$, and so forth, which is in general obviously not true - the residuals of a filter are all but white / uncorrelated, and here totally false (here the residuals are 100% positively correlated from one time step to the next).

It bothers me that the usually proposed equation in the literature is so blatantly problematic (and usually with no warning). Residuals are "never" white (and usually very far from it), unless if you have reached steady-state and the model accurately matches the actual system - which of course is typically when error levels tend to be minimal and thus when the need for 2nd+ order modelization is the lowest.

Although I get how it is constructed, the proposed equation on covariance propagation as-is is missing something that is most likely going to be major, even for simple purely quadratic systems (which is the ideal case).

Is there no way to tweak it in a somewhat proper way / are there references on the matter trying to account for this ? I guess if you have an idea of the time constant $\tau$ of your filter, you can multiply the additional term in the covariance propagation equation accordingly by something like $\tau/dt$ to obtain a more sensible result, but that is clearly hodge-podgey / not ideal and probably quite brain-melting to try to apply when different states have different dynamics.

Finally, note a sigma-point approach (like an UKF) would suffer from the same issue - is there no salvation here outside of particle filters ?