Matrix calculus - matrix derivative

80 Views Asked by At

I don't understand how given

  • $X$ is $m \times n$
  • $\Sigma$ is positive definite

  • $f=\theta^TX(\Sigma^{-1})^TX^T\theta$

How is $df/d\theta = 2X\Sigma^{-1}X^T\theta$.

1

There are 1 best solutions below

2
On BEST ANSWER

OK, here is how this works. Call $A = X (\Sigma)^{-T} X^T$ and note that $A^T=A$. Now we write the quadratic form $Q$ as

$$Q = \theta A \theta^T = \theta_i A_{ij} \theta_j = A_{ij} \theta_i \theta_j$$

Where summation convention is implied. Now, take the derivative with respect to $\theta_k$ to obtain

$$\begin{align} \frac{\partial Q}{\partial \theta_k} &= A_{ij} ( \frac{\partial \theta_i}{\partial \theta_k} \theta_j + \theta_i \frac{\partial \theta_j}{\partial \theta_k} ) \\ &= A_{ij} ( \delta_{ik} \theta_j + \theta_i \delta_{jk} ) \\ &= A_{kj} \theta_j + A_{ik} \theta_i \\ &= A_{kj} \theta_j + A_{jk} \theta_j \\ &= 2 A_{kj} \theta_j \\ \end{align}$$

which is equivalent to

$$\begin{align} \frac{\partial Q}{\partial \theta} &= 2 A\theta \\ &= 2 X (\Sigma)^{-T} X^T \theta \end{align}$$

I may give you some intuition to remember this matrix identity once and for all. Consider the quadratic form $Q$ as a simple single variable function of $\theta$. So $Q=A\theta^2$ and then take the derivative to get $2A\theta$. However, this is just for remembering and not actually presents what is happening.