I don't understand how given
- $X$ is $m \times n$
$\Sigma$ is positive definite
$f=\theta^TX(\Sigma^{-1})^TX^T\theta$
How is $df/d\theta = 2X\Sigma^{-1}X^T\theta$.
I don't understand how given
$\Sigma$ is positive definite
$f=\theta^TX(\Sigma^{-1})^TX^T\theta$
How is $df/d\theta = 2X\Sigma^{-1}X^T\theta$.
OK, here is how this works. Call $A = X (\Sigma)^{-T} X^T$ and note that $A^T=A$. Now we write the quadratic form $Q$ as
$$Q = \theta A \theta^T = \theta_i A_{ij} \theta_j = A_{ij} \theta_i \theta_j$$
Where summation convention is implied. Now, take the derivative with respect to $\theta_k$ to obtain
$$\begin{align} \frac{\partial Q}{\partial \theta_k} &= A_{ij} ( \frac{\partial \theta_i}{\partial \theta_k} \theta_j + \theta_i \frac{\partial \theta_j}{\partial \theta_k} ) \\ &= A_{ij} ( \delta_{ik} \theta_j + \theta_i \delta_{jk} ) \\ &= A_{kj} \theta_j + A_{ik} \theta_i \\ &= A_{kj} \theta_j + A_{jk} \theta_j \\ &= 2 A_{kj} \theta_j \\ \end{align}$$
which is equivalent to
$$\begin{align} \frac{\partial Q}{\partial \theta} &= 2 A\theta \\ &= 2 X (\Sigma)^{-T} X^T \theta \end{align}$$
I may give you some intuition to remember this matrix identity once and for all. Consider the quadratic form $Q$ as a simple single variable function of $\theta$. So $Q=A\theta^2$ and then take the derivative to get $2A\theta$. However, this is just for remembering and not actually presents what is happening.