a practical question about matrix derivative with inverse and chain rule: dimension mismatch

373 Views Asked by At

Recently, I was trying to take the following derivative $$ \dfrac{\partial (X^TV^{-1}X)^{-1}}{\partial V} $$ I was referring to matrix cookbook to solve it, where I found several useful equations:

Equation (59) says: $$ \dfrac{\partial Y^{-1}}{\partial x} = -Y^{-1}\dfrac{\partial Y}{\partial x}Y^{-1} $$ so, I think I have: $$ \dfrac{\partial (X^TV^{-1}X)^{-1}}{\partial V^{-1}} = -(X^TV^{-1}X)^{-1} X^TX(X^TV^{-1}X)^{-1} $$ and $$ \dfrac{\partial V^{-1}}{\partial V} = -V^{-1}V^{-1} $$ According to the chain rule, it should be: $$ \dfrac{\partial (X^TV^{-1}X)^{-1}}{\partial V} =\dfrac{\partial (X^TV^{-1}X)^{-1}}{\partial V^{-1}}\dfrac{\partial V^{-1}}{\partial V} = ((X^TV^{-1}X)^{-1} X^TX(X^TV^{-1}X)^{-1})^T V^{-1}V^{-1} $$

However, I met one problem. $V$ is a matrix of size $(n, n)$ and $X$ is a matrix of size $(n, m)$. Then, the first half of the chain rule is of size of $(m, m)$, while the second half of the chain rule is of size $(n, n)$.

Please help me figure out what goes wrong.

Thanks ahead.

2

There are 2 best solutions below

3
On BEST ANSWER

Let's define some intermediate variables $$\eqalign{ P &= V^{-1} \cr M &= X^TPX \cr F &= M^{-1} \cr }$$ whose differentials are $$\eqalign{ dP &= -V^{-1}\,dV\,V^{-1} \cr dM &= X^T\,dP\,X \cr dF &= -M^{-1}\,dM\,M^{-1} \cr }$$ That last differential is the one we're interested in, so let's successively substitute variables until we get back to $V$ $$\eqalign{ dF &= -M^{-1}\,dM\,M^{-1} \cr &= -M^{-1}\,(X^T\,dP\,X)\,M^{-1} \cr &= -M^{-1}\,X^T\,(-V^{-1}\,dV\,V^{-1})\,X\,M^{-1} \cr &= M^{-1}\,X^T\,V^{-1}\,dV\,V^{-1}\,X\,M^{-1} \cr &= F\,X^T\,V^{-1}\,dV\,V^{-1}\,X\,F \cr }$$ At this point, let's follow the prescription of Magnus & Neudecker for dealing with matrix-by-matrix derivatives, and vectorize both sides $$\eqalign{ d{\rm vec}(F) &= (V^{-1}\,X\,F)^T\otimes(F\,X^T\,V^{-1})\,d{\rm vec}(V) \cr }$$ Which can be rearranged to the conventional looking result $$\eqalign{ \frac{\partial f}{\partial v} &= (V^{-1}\,X\,F)^T\otimes(F\,X^T\,V^{-1}) \cr }$$

5
On

I don´t see how equation (59) should help. $V$ does not depend on $X$.

If $X$ is a square matrix then $\left(X^TV^{-1}X \right)^{-1}=X^{-1}V\left(X^{-1}\right)^T$

Let $X^{-1}=A$. We get $AVA^T$

You can see here (page 9) that $$\frac{\partial AVA^T}{\partial V}= AA^T$$