Chain rule on matrix differentiation of trace in Kalman gain proof

Question

Chain rule on matrix differentiation of trace in Kalman gain proof

890 Views Asked by Bumbble Comm At 01 Apr 2026 - 9:53

I was reading the following derivation of the Kalman filter gain http://www.robots.ox.ac.uk/~ian/Teaching/Estimation/LectureNotes2.pdf on pg 5-6.

From the following

$$L=\min_{K_{k+1}} trace(P_{k+1|k+1})$$

With

$$ P_{k+1|k+1} = \boldsymbol{ (I-K_{k+1} H_{k+1}) P_{k+1|k} (I-K_{k+1} H_{k+1})^{T} + K_{k+1}R_{k+1}K_{k+1}^T }$$

K,H, and P are all nxn matrices.

It uses the following trace differentiation identity:

$${\frac{\partial}{\partial \boldsymbol{A}}} (trace(\boldsymbol{ABA^T})) = 2\boldsymbol{AB}$$

But from there, there is a bit of a skip to the result of the differentiation

$${\frac{\partial L}{\partial \boldsymbol{ K_{k+1}}}} = -2(\boldsymbol{I-K_{k+1}H_{k+1})P_{k+1|k}H_{k+1}^T+2K_{k+1}R_{k+1} } = 0$$

What I am wondering about is the origin of the $H_{k+1}^T$ term in the above equation.

It appears to me that chain rule is applied here. But most online sources on matrix chain rule or trace identities does not show a direct example that the chain rule can be applied to the derivative of a trace.

ie. This is what I think happens in the intermediate step:

$$\boldsymbol{A=I-K_{k+1} H_{k+1}; B=P_{k+1|k}}$$

$${\frac{\partial}{\partial \boldsymbol{K_{k+1}}}} (trace(\boldsymbol{ABA^T})) = 2\boldsymbol{AB} * {\frac{\partial (trace (\boldsymbol{A}))}{\partial \boldsymbol{ K_{k+1}}}}$$

Question: Is the above chain rule application valid? If so, I would appreciate a pointer to a proof.

And then using

$$ {\frac{\partial trace(\boldsymbol{K_{k+1}H_{k+1}} )} {\boldsymbol{\partial K_{k+1}}}} = \boldsymbol{H_{k+1}^T} $$

To arrive at

$$ = -2(\boldsymbol{I-K_{k+1} H_{k+1}) * P_{k+1|k} * H_{k+1}^T}$$

If the intermediate steps is actually different, please let me know and link to any proofs of the identities used as I am self-learning linear algebra. Thanks.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 10 Oct 2020 - 1:49

$ \def\bb{\mathbb} \def\tss{\mathop{\bigotimes}\limits} \def\dss{\mathop{\odot}\limits} $ For ease of typing, drop the elaborate subscripts and define the matrices $$\eqalign{ &H = H_{k+1} \qquad &P = P_{k+1|k} \qquad &R = R_{k+1} \\ &X = K_{k+1} \qquad &Y = XH-I \qquad &dY = dX\,H \\ }$$ Then the problem is to calculate the gradient of the function $$\eqalign{ L &= {\rm Tr}\Big(XRX^T + YPY^T\Big) \\ &= R:X^TX + P:Y^TY \\ }$$ where a colon denotes the trace/Frobenius product, i.e. $$\eqalign{ A:B = {\rm Tr}(A^TB) = {\rm Tr}(B^TA) = B:A \\ }$$ Calculate the differential, then the gradient. $$\eqalign{ dL &= R:(dX^TX+X^TdX) + P:(dY^TY+Y^TdY) \\ &= (R+R^T):X^TdX + (P+P^T):Y^TdY \\ &= X(R+R^T):dX + Y(P+P^T):dY \\ &= X(R+R^T):dX + (XH-I)(P+P^T):dX\,H \\ &= \Big(X(R+R^T) + (XH-I)(P+P^T)H^T\Big):dX \\ \frac{\partial L}{\partial X} &= X(R+R^T) + (XH-I)(P+P^T)H^T \\ }$$ If $R$ and $P$ are symmetric, then this can be simplified to the given result $$\eqalign{ \frac{\partial L}{\partial X} &= 2XR + 2(XH-I)PH^T \\ }$$

**user91684** · Accepted Answer

Firstly $\nabla_A(ABA^T)=A(B+B^T)=2AB$ only if $B$ is symmetric.

Here $L=trace((I-KH)Q(I-KH)^T+KRK^T)$ where $Q,R$ are symmetric matrices. The derivative is $DL_K:Z\rightarrow trace(-ZHQ(I-KH)^T+(I-KH)Q(-ZH)^T+ZRK^T+KRZ^T)=trace(-2ZHQ(I-KH)^T+2ZRK^T)=trace((-2HQ(I-KH)^T+2RK^T)Z)$.

I wrote that follows in at least 10 posts but nobody reads my post (tears). Yet, no students know how to calculate a gradient. Incredible !

A gradient is associated to a scalar product; we take, here, $(U,V)=trace(U^TV)$. The associated gradient is defined by, for every $Z$, $(\nabla_K(L),Z)=DL_K(Z)$. Thus $\nabla_K(L)=-2(I-KH)QH^T+2KR$.

EDIT (answers to frank): 1),2) $DL_K$ is a linear application and $Z$ is the variable. For instance, if $f:K\rightarrow KAK$, then $Df_K:Z\rightarrow ZAK+KAZ$, the derivative of a product ! ($(uv)'=u'v+uv'$). If $f:K\rightarrow -KHQ$, then $Df_K:Z\rightarrow -ZHQ$. $(ua)'=u'a$.

3) Here $Df_X:(h,k,l)\rightarrow [3z,4y^3,3x][h,k,l]^T$. We use the scalar product over vectors $(u,v)=trace(u^Tv)=u^Tv$. Then $\nabla_Xf=[3z,4y^3,3x]^T$. More generally, to define a gradient with the help of a scalar product has a geometric meaning. You do not need a basis.

4) The derivative uses trace and the gradient not. It is linked to the chosen scalar product.

5) Consider the application $f:K\in\mathcal{M}_n(\mathbb{R})\rightarrow KAK\in\mathcal{M}_n(\mathbb{R})$, then the derivative is the linear application $Df_K:Z\in\mathcal{M}_n(\mathbb{R})\rightarrow ZAK+KAZ\in\mathcal{M}_n(\mathbb{R})$. In the same way, if $g(K)=K^3$, then $Dg_K(Z)=ZK^2+KZK+K^2Z$. If you choose coordinates, then the matrix of the derivative is its Jacobian. For 3), $f:\mathbb{R^3}\rightarrow \mathbb{R}$ and let $X=[x,y,z]^T$. $Df_X$ is a linear application $(h,k,l)\in\mathbb{R^3}\rightarrow \mathbb{R}$ and , consequently, its matrix (the Jacobian) is $1\times 3$, that is a row. The gradient is the transpose of the previous matrix, that is a vector.

Chain rule on matrix differentiation of trace in Kalman gain proof

There are 2 best solutions below

Related Questions in MATRICES

Related Questions in DERIVATIVES

Related Questions in TRACE

Trending Questions

Popular # Hahtags

Popular Questions