Derivative with respect to entries of a matrix

576 Views Asked by At

What is the derivative of this matrix expression with respect to $\theta_k$ \begin{equation} \begin{aligned} \mathcal{J}(X, \theta) &= {\bf trace}\left( XX^TP(\theta)^{-1} \right) +{\bf trace}\left( (Y-H(\theta)X)(Y-H(\theta)X)^T \Sigma^{-1} \right)\\ & = X^TP(\theta)^{-1}X + (Y-H(\theta)X)^T \Sigma^{-1} (Y-H(\theta)X)^T \end{aligned} \end{equation}

$X$ and $Y$ are vectors

$\theta$ is a vector with entries $\theta_k$

$P(\theta)$ and $H(\theta)$ are matrices constructed using some or all of the entries of $\theta$ and possibly other constants.

The matrix $\Sigma$ is an invertible known constant matrix

All vectors and matrices have compatible dimensions.

I tried to use The Matrix Cookbook to calculate this derivative. Here is my result:

\begin{equation} \begin{aligned} \frac{\partial \mathcal{J}(X,\theta)}{\partial \theta_k } =& - {\bf trace} \left( X X^T P(\theta)^{-1} \frac{\partial P(\theta)}{\partial \theta_k}P(\theta)^{-1} \right) \\ & - 2\; {\bf trace} \left(\frac{\partial H(\theta)}{\partial \theta_k} X Y^T \Sigma_e^{-1}\right)\\ &+ 2\; {\bf trace} \left(\frac{\partial H(\theta)}{\partial \theta_k} \Sigma_e^{-1} H(\theta) X X^T\right) \end{aligned} \end{equation}

Is this result correct? Can you explain if there is a mistake? Also I would like to know if there is a better way to write this derivative.

2

There are 2 best solutions below

0
On

I haven't checked carefully, but your final result "looks" right.

When taking derivatives of matrix, it's always a good idea to put in the indexes and use Einstein convention (i.e. $a_i b_i$ is understood as $\sum_i a_i b_i$).

$X^T P(\theta)^{-1} X$ is the same as $\sum_{ij} X_i P^{-1}_{ij}(\theta) X_j = X_i P^{-1}_{ij}(\theta) X_j$ (Einstein convention in the last step). Thus all we need to do is take the derivative of $P^{-1}(\theta)$. Since $P^{-1}(\theta) P(\theta) = I$ or $P^{-1}_{ij}(\theta) P_{jk}(\theta) = \delta_{ik}$, $\partial_{\theta_k} P^{-1}_{ij}(\theta) P_{jk}(\theta) + P^{-1}_{ij}(\theta) \partial_{\theta_k} P_{jk}(\theta) = 0$, which is equivalent to $\partial_{\theta_k} P^{-1}(\theta) P(\theta) + P^{-1}(\theta) \partial_{\theta_k} P(\theta)= 0$ and thus $\partial_{\theta_k} P^{-1}(\theta) = - P^{-1}(\theta) \partial_{\theta_k} P(\theta) P^{-1}(\theta)$.

The derivative of $H(\theta)$ can be taken straight-forwardly.

By the way, you do not need to "trace" so many times; the expression on the second line does not contain any trace, and it's the most natural form.

0
On

Define the vector $z = (Hx-y)$. Then rewrite the function in terms of the Frobenius (:) product and find its differential $$\eqalign{ {\mathcal J} &= \Sigma^{-1}:zz^T + xx^T:P^{-1} \cr\cr d{\mathcal J} &= \Sigma^{-1}:d(zz^T) + xx^T:dP^{-1} \cr &= \Sigma^{-1}:(dz\,z^T+z\,dz^T) - xx^T:P^{-1}\,dP\,P^{-1} \cr &= (\Sigma^{-1}+\Sigma^{-T}):dz\,z^T - P^{-T}xx^TP^{-T}:dP \cr &= (\Sigma^{-1}+\Sigma^{-T})z:dH\,x - P^{-T}xx^TP^{-T}:dP \cr &= (\Sigma^{-1}+\Sigma^{-T})(Hx-y)x^T:dH - P^{-T}xx^TP^{-T}:dP \cr }$$ Now substitute $d_{\theta_k} \rightarrow d$ to obtain the desired derivative $$\eqalign{ \frac{\partial {\mathcal J}}{\partial \theta_k} &= (\Sigma^{-1}+\Sigma^{-T})(Hx-y)x^T:\Big(\frac{\partial H}{\partial \theta_k}\Big) - P^{-T}xx^TP^{-T}:\Big(\frac{\partial P}{\partial \theta_k}\Big) \cr }$$ If you are uncomfortable with the Frobenius product, you can replace it with the equivalent trace expression $$ A:B = {\rm tr}(A^TB)$$