Gradient of a matrix equation (From the proof of the Kalman Filter)

81 Views Asked by At

My professor recently walked me through the proof of the Kalman Filter. I understand most of it except one small section. Starting from:

\begin{equation*} J(x) = \frac{1}{2}(y - \mathcal{H}(x))^T R^{-1} (y - \mathcal{H}(x)) \end{equation*}

where

\begin{equation*} x \in \mathbb{R}^n \\ y \in \mathbb{R}^m \ \text{(constant)} \\ \mathcal{H}: \mathbb{R}^n \rightarrow \mathbb{R}^m \\ R \in \mathbb{R}^{m \times m} \ \text{(symmetric matrix)} \\ \end{equation*}

We do a first-order approximation of the $\mathcal{H}$ map around some $\hat{x} \in \mathbb{R}^n$:

\begin{equation*} \mathcal{H}(x) \approx \mathcal{H}(\hat{x}) + \mathcal{H}'(\hat{x}) \ (x - \hat{x}) \end{equation*}

My professor then claims that this can be well approximated by $\mathcal{H} \approx H$, for some $H \in \mathbb{R}^{m \times n}$. He later proceeds to find the gradient of $J$, arriving at the following result:

\begin{equation*} \nabla J(x) = [-H^T] R^{-1} (y - Hx) \end{equation*}

I have two issues:

  1. I want to know under which conditions may you justify the $\mathcal{H} \approx H$ approximation. I think this can only be true when both $\hat{x}$ and $\mathcal{H}(\hat{x})$ are "sufficiently small".

  2. How does one get $\nabla J(x)$? I understand that $\nabla (x^TAx) = 2Ax$, whenever $A$ is symmetric, but I don't get how the $-H^T$ is obtained. I suspect it comes from the chain rule, but I cannot formally justify it.

Edit: To elaborate further on the context of the problem:

  1. $\mathcal{H}$ is the observation operator.
  2. $y$ corresponds to an observation.
  3. $x$ corresponds to the true state of the system we are trying to model.
  4. $\hat{x}$ corresponds to the predicted state of the system before adjusting for observations.
  5. $R$ is the covariance matrix for the observational errors.

Edit 2: Thinking about it, my professor may have actually meant that the derivative of $\mathcal{H}$ can be well approximated by the matrix $H$. Could this be correct?

1

There are 1 best solutions below

0
On

The Kalman filter derivations are usually done for linear systems, because that's the case in which you can prove its optimality conditions. In that case, your error is $$ \begin{array} J(x) &= \frac{1}{2}(y-Hx)^{\rm T}R^{-1} (y-Hx)\\ &= \frac{1}{2}(y^{\rm T}y - 2y^{\rm T}R^{-1}Hx+x^{\rm T}H^{\rm T}R^{-1}Hx) \end{array} $$ Now $$ \nabla J = \frac{1}{2}(-2y^{\rm T}R^{-1}H + 2x^{\rm T}H^{\rm T}R^{-1}H)=0 $$ Now just solve for $x$ and you're done. The case where $\mathcal{H}(x)$ is sometimes handled by taking linearizations, which yeild the extended Kalman filter, which doesn't have any guarantees of stability let alone optimality. However, in practice it turns out to work which is why it's popular.