Matrices derivative

129 Views Asked by At

I have a linear product of matrices, I did solve most of it, however, I stop at this component $(X^T W^T D W X)^{-1}$. Given that $X$ is $n \times p$ matrix and $D$ is $n\times n$ matrix. $W$ is a diagonal matrix $n\times n$ what is the derivative of this component with respect of $W$.

$\frac{\partial}{\partial W}(X^T W^T D W X)^{-1}$ = ?

2

There are 2 best solutions below

0
On BEST ANSWER

For convenience, define $G=X^TWDWX$.

Since {$W,D$} are diagonal, they are symmetric and therefore $G$ is symmetric, too.

Then your matrix function and its differential are $$ \eqalign{ F &= G^{-1} \cr dF &= -F\,dG\,F \cr &= -FX^T\,d(WDW)\,XF \cr &= -FX^T\,(dW)\,DWXF - FX^TWD\,(dW)\,XF \cr }$$ Apply the vec operation to both sides of the differential expression $$ \eqalign{ {\rm vec}(dF) &= -(FX^TWD\otimes FX^T)\,\,{\rm vec}(dW) - (FX^T\otimes FX^TWD)\,\,{\rm vec}(dW) \cr df &= -\Big((FX^TWD\otimes FX^T) + (FX^T\otimes FX^TWD)\Big)\,dw \cr \frac{\partial f}{\partial w} &= -(FX^TWD\otimes FX^T) - (FX^T\otimes FX^TWD) \cr }$$ This sort of vec/vec solution is typical for matrix-by-matrix derivatives, unless you're willing to consider $4^{th}$ order tensors.

0
On

I will work with a general matrix $W$ and briefly discuss the diagonal case in the end; it is easier for me to do so. Consider the mapping $f:\mathbb R^{n\times n}\to\mathbb R^{p\times p}$ given by $h(W)=X^TW^TDWX$. Let $W_0$ be such that $X^TW_0^TDW_0X$ is invertible. Since $h$ is continuous, there is a neighborhood $U$ of $W_0$ so that $h(W)$ is invertible for all $W\in U$.

Let $g:\mathbb R^{p\times p}\to\mathbb R^{p\times p}$ be the inversion; i.e. $g(A)=A^{-1}$. We are interested in the map $f=g\circ h$, and its derivative at $W_0$ is $Df(W_0)=Dg(h(W_0))\circ Dh(W_0)$. We thus want to calculate the derivatives $Dg$ and $Dh$.

We have $$ h(W+V)-h(W) = X^T(W^TDV+V^TDW+V^TDV)X = X^T(W^TDV+V^TDW)X+O(\|V\|^2), $$ so $$ Dh(W_0)V = X^T(W_0^TDV+V^TDW_0)X. $$ For $g$ we get $Dg(A)B=-A^{-1}BA^{-1}$ (for any invertible matrix $A$). Googling for the derivative of matrix inversion should provide you with a proof if needed.

Combining these, we have for any $V\in\mathbb R^{n\times n}$ $$ Df(W_0)V = -(X^TW_0^TDW_0X)^{-1}X^T(W_0^TDV+V^TDW_0)X(X^TW_0^TDW_0X)^{-1}. $$ What this expression for the derivative means in practice is that $$ f(W_0+V) = f(W_0)+Df(W_0)V+O(\|V\|^2) $$ for small $V$.

If you only want to consider diagonal matrices $W$, you can take $W_0$ (point of differentiation) and $V$ (direction of differentiation) to be diagonal. I don't see how this information could be used to simplify the horrible mess of a formula we got, but this is what calculating derivatives of matrix valued functions of matrix variables is like.