Chain rule for a matrix derivative

554 Views Asked by At

I am trying to derive the following expression:

$\frac{\partial} {\partial \theta_i}Tr(A(\theta)^{-1}y (A(\theta)^{-1}y)^{T}B(\theta))$.

what I did is :

$\frac{\partial} {\partial \theta_i}Tr(A(\theta)^{-1}y (A(\theta)^{-1}y)^{T}B(\theta))= Tr \left(\frac{\partial (A(\theta)^{-1})}{\partial \theta_i} y (A(\theta)^{-1}y)^{T}B(\theta) + A(\theta)^{-1}y y^{T}\frac{\partial (A(\theta)^{-1})}{\partial \theta_i}B(\theta) +A(\theta)^{-1}y (A(\theta)^{-1}y)^{T} \frac{\partial B(\theta) }{\partial \theta_i} \right)$.

I already know that the $\frac{\partial (A(\theta)^{-1})}{\partial \theta_i} = -A(\theta)^{-1}\frac{\partial A(\theta)}{\partial \theta_i} A(\theta)^{-1}$.

However, I am not sure that I applied correctly the chain rule inside the trace.

1

There are 1 best solutions below

0
On

Define the variables $V=A^{-1}\,$ and $\,M=yy^T$.
Let's also use the inner/Frobenius product as a cleaner way of writing the trace, i.e. $$X:Y={\rm tr}(X^TY)$$

Now rewrite the function as $$f=M:V^TBV$$ Find the differential $$\eqalign{ df &= M:(dV^T\,BV+V^T\,dB\,V+V^TB\,dV) \cr &= M:(V^TB^T\,dV+V^T\,dB\,V+V^TB\,dV) \cr &= VMV^T:dB + (BVM+B^TVM):dV \cr &= VMV^T:dB - (BVM+B^TVM):(V\,dA\,V) \cr &= VMV^T:dB - V^T(BVM+B^TVM)V^T:dA \cr\cr }$$ The derivative wrt the scalar $\theta$ has the same form, just replace $d$ by $\frac{d}{d\theta}$.